# <span style="color:blue"> **Pandas - Basics** </span>
### Exercise 00: Series

- Create a one-dimensional labeled array capable of holding any data type using a Series.

  Output:
  
      #> A 3
      #> B 5
      #> C 7
      #> D 4
    

In [27]:
import pandas as pd
import numpy as np

data = np.array([3,5,7,4])

serie = pd.Series(data, index=['A','B','C','D'])

print(serie)

# Pandas Series is a one-dimensional labeled array capable of holding data of any type
# (integer, string, float, python objects, etc.).

A    3
B    5
C    7
D    4
dtype: int64


### Exercise 01: DataFrame

- Create a two-dimensional labeled data structure with columns of potentially different types using a DataFrame.

  Example: 

      data = {'Country': ['Belgium', 'India', 'Brazil'],
      'Capital': ['Brussels', 'New Delhi', 'Brasilia'],
      'Population': [11190846, 1303171035, 207847528]}

In [28]:
data = {'Name':['Maria', 'Amit', 'Sergio', 'Karla'],'Age':[28,34,29,42]}

df = pd.DataFrame(data)

print(df)

     Name  Age
0   Maria   28
1    Amit   34
2  Sergio   29
3   Karla   42


### Exercise 02: Display summary

- Display a summary of the basic information about this DataFrame and its data.

        data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

        labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

In [99]:
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(data, index=labels)

print(df, "\n")


# The info() function is used to print a concise summary of a DataFrame. 
# This method prints information about a DataFrame including the index dtype and column dtypes, non-null values and memory usage.

df.info(verbose=True)


# The describe() function is used to generate descriptive statistics that summarize the central tendency,
# dispersion and shape of a dataset’s distribution, excluding NaN values.


df.describe(include='all')

# df.describe(include=[np.number])

# df.describe(include=[np.object])



  animal  age  visits priority
a    cat  2.5       1      yes
b    cat  3.0       3      yes
c  snake  0.5       2       no
d    dog  NaN       3      yes
e    dog  5.0       2       no
f    cat  2.0       3       no
g  snake  4.5       1       no
h    cat  NaN       1      yes
i    dog  7.0       2       no
j    dog  3.0       1       no 

<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, a to j
Data columns (total 4 columns):
animal      10 non-null object
age         8 non-null float64
visits      10 non-null int64
priority    10 non-null object
dtypes: float64(1), int64(1), object(2)
memory usage: 400.0+ bytes


Unnamed: 0,animal,age,visits,priority
count,10,8.0,10.0,10
unique,3,,,2
top,dog,,,no
freq,4,,,6
mean,,3.4375,1.9,
std,,2.007797,0.875595,
min,,0.5,1.0,
25%,,2.375,1.0,
50%,,3.0,2.0,
75%,,4.625,2.75,


### Exercise 03: Displaying rows

- Return the first 3 rows of the DataFrame df.

In [100]:
# Dataframe.iloc[] method is used when the index label of a data frame is something other than numeric series of 0, 1, 2, 3….n 
# or in case the user doesn’t know the index label.

data = df.iloc[0:3]

print(data)

  animal  age  visits priority
a    cat  2.5       1      yes
b    cat  3.0       3      yes
c  snake  0.5       2       no


### Exercise 04: Retrieving data

- Select the rows the age is between 2 and 4 (inclusive).

In [101]:
df = df[df['age'].between(2, 4, inclusive= True)]

print(df)

  animal  age  visits priority
a    cat  2.5       1      yes
b    cat  3.0       3      yes
f    cat  2.0       3       no
j    dog  3.0       1       no


### Exercise 05: Adding and dropping data

- Append a new row ’k’ to df with your choice of values for each column. Then delete
that row to return the original DataFrame.

In [104]:
data2 = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df1 = pd.DataFrame(data2, index=labels)

# Add row in the dataframe using dataframe.append() and Dictionary
modDf = df1.append(pd.DataFrame({'animal': 'cat','age': 6, 'visits': 5,'priority': 'yes' }, index=['k']))
print(modDf, "\n")

# Get names of indexes for which column animal has value k
indexK = modDf[modDf['age'] == 6].index

# Delete these row indexes from dataFrame
modDf.drop(indexK,inplace=True)
print(modDf)

  animal  age  visits priority
a    cat  2.5       1      yes
b    cat  3.0       3      yes
c  snake  0.5       2       no
d    dog  NaN       3      yes
e    dog  5.0       2       no
f    cat  2.0       3       no
g  snake  4.5       1       no
h    cat  NaN       1      yes
i    dog  7.0       2       no
j    dog  3.0       1       no
k    cat  6.0       5      yes 

  animal  age  visits priority
a    cat  2.5       1      yes
b    cat  3.0       3      yes
c  snake  0.5       2       no
d    dog  NaN       3      yes
e    dog  5.0       2       no
f    cat  2.0       3       no
g  snake  4.5       1       no
h    cat  NaN       1      yes
i    dog  7.0       2       no
j    dog  3.0       1       no


### Exercise 06: Calculate

- Calculate the mean age for each different animal in df.

In [105]:
average = df1.groupby(['animal'])['age'].mean()
print(average)


animal
cat      2.5
dog      5.0
snake    2.5
Name: age, dtype: float64


### Exercise 07: Sorting

- Sort df first by the values in the ’age’ in descending order, then by the values in the
’visit’ column in ascending order.

In [108]:
sorted = df1.sort_values(['age','visits'], ascending=[False, True])
print(sorted)

  animal  age  visits priority
i    dog  7.0       2       no
e    dog  5.0       2       no
g  snake  4.5       1       no
j    dog  3.0       1       no
b    cat  3.0       3      yes
a    cat  2.5       1      yes
f    cat  2.0       3       no
c  snake  0.5       2       no
h    cat  NaN       1      yes
d    dog  NaN       3      yes
