# introduction to pandas

* Pandas is a fast, powerful, flexible and easy to use open source data analysis and data manipulation library built on top of the Python programming language.
* It is a high-level data manipulation tool developed by Wes McKinney in 2008.
* It is built on the Numpy package and its key data structure is called DataFrame.
* It is used for data manipulation, data analysis, data cleaning and data visualization.
* It is widely used in data science, machine learning and data analytics.
* It can handle large datasets and is very fast.
* It can read data from various file formats like CSV, Excel, JSON, SQL, HTML, etc.

1. [Create Data Frame](#create-a-dataframe)
2. [Read CSV File](#read-a-csv-file)
3. [Slices and Indexing](#Slicing-and-indexing-a-dataframe)
4. [Selecting a Column](#Select-a-column)
5. [Basic Statistics](#basic-statistics)
6. [Apply a Function](#apply-a-function)
7. [Masking](#masking)


In [67]:
import pandas as pd
import seaborn as sns

### create a series

### create a dataframe

In [68]:
# create a dataframe
df = pd.DataFrame({
    'name': ['John', 'Smith', 'Paul'],
    'age': [23, 45, 32],
    'city': ['New York', 'Chicago', 'Los Angeles']
})

In [69]:
df

Unnamed: 0,name,age,city
0,John,23,New York
1,Smith,45,Chicago
2,Paul,32,Los Angeles


### read a csv file

In [70]:

iris = sns.load_dataset('iris')

In [71]:
# head and tail
iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


## Slicing and indexing a dataframe

In [72]:
# Slice a dataframe
iris.iloc[0:5, 0:2]

Unnamed: 0,sepal_length,sepal_width
0,5.1,3.5
1,4.9,3.0
2,4.7,3.2
3,4.6,3.1
4,5.0,3.6


In [73]:
iris.loc[0:5, 'sepal_length':'petal_width']

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
5,5.4,3.9,1.7,0.4


## Select a column

In [74]:
iris.species

0         setosa
1         setosa
2         setosa
3         setosa
4         setosa
         ...    
145    virginica
146    virginica
147    virginica
148    virginica
149    virginica
Name: species, Length: 150, dtype: object

In [75]:
iris['species']

0         setosa
1         setosa
2         setosa
3         setosa
4         setosa
         ...    
145    virginica
146    virginica
147    virginica
148    virginica
149    virginica
Name: species, Length: 150, dtype: object

In [76]:
iris[['species']]

Unnamed: 0,species
0,setosa
1,setosa
2,setosa
3,setosa
4,setosa
...,...
145,virginica
146,virginica
147,virginica
148,virginica


## Basic statistics

In [77]:
iris.petal_length.describe()

count    150.000000
mean       3.758000
std        1.765298
min        1.000000
25%        1.600000
50%        4.350000
75%        5.100000
max        6.900000
Name: petal_length, dtype: float64

In [78]:
iris.petal_length.quantile([0.25, 0.75])

0.25    1.6
0.75    5.1
Name: petal_length, dtype: float64

In [79]:
# min and max, median, mean
iris.petal_length.min(), iris.petal_length.max(), iris.petal_length.median(), iris.petal_length.mean()

(1.0, 6.9, 4.35, 3.7580000000000005)

In [80]:
iris[["sepal_length", "petal_width"]].corr()

Unnamed: 0,sepal_length,petal_width
sepal_length,1.0,0.817941
petal_width,0.817941,1.0


In [81]:
iris[["petal_length", "petal_width"]].corr()

Unnamed: 0,petal_length,petal_width
petal_length,1.0,0.962865
petal_width,0.962865,1.0


## apply a function

## masking

## group-by