## Importing pandas

### Getting started and checking your pandas setup

**1.** Import pandas under the alias `pd`.

In [88]:
import pandas as pd
import numpy as np

**2.** Print the version of pandas that has been imported.

In [89]:
pd.__version__

'1.3.4'

In [123]:
## pd.show_versions()

**3.** Try checking for the help of any of the function in pandas.

## DataFrame basics

### A few of the fundamental routines for selecting, sorting, adding and aggregating data in DataFrames


Consider the following Python dictionary `data` and Python list `labels`:

``` python
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
```
**4.** Create a DataFrame `df` from this dictionary `data` which has the index `labels`.

In [91]:
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}
data

{'animal': ['cat',
  'cat',
  'snake',
  'dog',
  'dog',
  'cat',
  'snake',
  'cat',
  'dog',
  'dog'],
 'age': [2.5, 3, 0.5, nan, 5, 2, 4.5, nan, 7, 3],
 'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
 'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

In [92]:
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
labels

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

In [93]:
df = pd.DataFrame(data = data , index = labels)
df

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no
d,dog,,3,yes
e,dog,5.0,2,no
f,cat,2.0,3,no
g,snake,4.5,1,no
h,cat,,1,yes
i,dog,7.0,2,no
j,dog,3.0,1,no


**5.** Display a summary of the basic information about this DataFrame and its data (*hint: there is a single method that can be called on the DataFrame*).

In [94]:
df.describe(include = 'all')

Unnamed: 0,animal,age,visits,priority
count,10,8.0,10.0,10
unique,3,,,2
top,cat,,,no
freq,4,,,6
mean,,3.4375,1.9,
std,,2.007797,0.875595,
min,,0.5,1.0,
25%,,2.375,1.0,
50%,,3.0,2.0,
75%,,4.625,2.75,


**6.** Return the first 3 rows of the DataFrame `df`.

In [95]:
df.head(3)

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no


**7.** Select just the 'animal' and 'age' columns from the DataFrame `df`.

In [96]:
df[['animal','age']]

Unnamed: 0,animal,age
a,cat,2.5
b,cat,3.0
c,snake,0.5
d,dog,
e,dog,5.0
f,cat,2.0
g,snake,4.5
h,cat,
i,dog,7.0
j,dog,3.0


**8.** Select the data in rows `[3, 4, 8]` *and* in columns `['animal', 'age']`.

In [129]:
df.loc[['c','d','h'],['animal','age']]

Unnamed: 0,animal,age
c,snake,0.5
d,dog,
h,cat,


**9.** Select only the rows where the number of visits is greater than 3.

In [97]:
df[df['visits'] > 3]

Unnamed: 0,animal,age,visits,priority


**10.** Check for missing values in the data.

In [98]:
df.isnull()

Unnamed: 0,animal,age,visits,priority
a,False,False,False,False
b,False,False,False,False
c,False,False,False,False
d,False,True,False,False
e,False,False,False,False
f,False,False,False,False
g,False,False,False,False
h,False,True,False,False
i,False,False,False,False
j,False,False,False,False


In [99]:
df.isnull().sum()

animal      0
age         2
visits      0
priority    0
dtype: int64

**11.** Select the rows where the animal is a cat *and* the age is less than 3.

In [100]:
df[(df['animal']=='cat') & (df['age'] < 3)]

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
f,cat,2.0,3,no


**12.** Select the rows the age is between 2 and 4 (inclusive).

In [101]:
df[(df['age']>2) & (df['age'] < 4)]

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
j,dog,3.0,1,no


**13.** Change the age in row 'f' to 1.5.

In [102]:
df.loc['f','age'] = 1.5
df

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no
d,dog,,3,yes
e,dog,5.0,2,no
f,cat,1.5,3,no
g,snake,4.5,1,no
h,cat,,1,yes
i,dog,7.0,2,no
j,dog,3.0,1,no


**14.** Calculate the sum of all visits in `df` (i.e. find the total number of visits).

In [103]:
df['visits'].sum()

19

**15.** Calculate the mean age for each different animal in `df`. Explore the groupby function.

In [104]:
df.groupby(by ='animal').mean()

Unnamed: 0_level_0,age,visits
animal,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,2.333333,2.0
dog,5.0,2.0
snake,2.5,1.5


In [105]:
df.groupby('animal').sum()

Unnamed: 0_level_0,age,visits
animal,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,7.0,8
dog,15.0,8
snake,5.0,3


**16.** Append a new row 'k' to `df` with your choice of values for each column. Then delete that row to return the original DataFrame.

In [106]:
df.loc['k'] = ['snake',4.0,5,'no']
df

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no
d,dog,,3,yes
e,dog,5.0,2,no
f,cat,1.5,3,no
g,snake,4.5,1,no
h,cat,,1,yes
i,dog,7.0,2,no
j,dog,3.0,1,no


In [107]:
df.drop(columns = 'priority') ## Drop Columns

Unnamed: 0,animal,age,visits
a,cat,2.5,1
b,cat,3.0,3
c,snake,0.5,2
d,dog,,3
e,dog,5.0,2
f,cat,1.5,3
g,snake,4.5,1
h,cat,,1
i,dog,7.0,2
j,dog,3.0,1


In [108]:
df.drop(labels ='k') ## Drop Row (By default rows)

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no
d,dog,,3,yes
e,dog,5.0,2,no
f,cat,1.5,3,no
g,snake,4.5,1,no
h,cat,,1,yes
i,dog,7.0,2,no
j,dog,3.0,1,no


**17.** Count the number of each type of animal in `df`.

In [109]:
len(df['animal'].unique())

3

**18.** Sort `df` first by the values in the 'age' in *decending* order, then by the value in the 'visit' column in *ascending* order (so row `i` should be first, and row `d` should be last).

In [110]:
df.sort_values('age',ascending = False)

Unnamed: 0,animal,age,visits,priority
i,dog,7.0,2,no
e,dog,5.0,2,no
g,snake,4.5,1,no
k,snake,4.0,5,no
b,cat,3.0,3,yes
j,dog,3.0,1,no
a,cat,2.5,1,yes
f,cat,1.5,3,no
c,snake,0.5,2,no
d,dog,,3,yes


In [111]:
df.sort_values('visits')

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
g,snake,4.5,1,no
h,cat,,1,yes
j,dog,3.0,1,no
c,snake,0.5,2,no
e,dog,5.0,2,no
i,dog,7.0,2,no
b,cat,3.0,3,yes
d,dog,,3,yes
f,cat,1.5,3,no


**19.** The 'priority' column contains the values 'yes' and 'no'. Replace this column with a column of boolean values: 'yes' should be `True` and 'no' should be `False`.

In [117]:
df['priority'] = df['priority'].map({'yes':True , 'no':False})
df

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,True
b,cat,3.0,3,True
c,snake,0.5,2,False
d,dog,,3,True
e,dog,5.0,2,False
f,cat,1.5,3,False
g,snake,4.5,1,False
h,cat,,1,True
i,dog,7.0,2,False
j,dog,3.0,1,False


**20.** In the 'animal' column, change the 'snake' entries to 'python'.

In [122]:
df.replace({'animal':{'snake':'python'}})  ## Method 2

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,True
b,cat,3.0,3,True
c,python,0.5,2,False
d,dog,,3,True
e,dog,5.0,2,False
f,cat,1.5,3,False
g,python,4.5,1,False
h,cat,,1,True
i,dog,7.0,2,False
j,dog,3.0,1,False
