## Pandas

<b>Pandas</b> stands for ```Python Data Analysis Library```

<b>Pandas is an open source python library providing</b>
- high performance
- easy-to-use data structures
- dat analysis tools

<b>Pandas is used in both academic and commercials domains for a wide variety of tasks</b>
- statistics
- analytics
- finance
- machine learning
- artificial intelligence
- deep learning
- data science


<b>Key features of pandas</b>
- Easy data alignment
- Fancy indexing
- Flexibility working with large data
- Easy handling of missing data
- Helpful in time series analysis
- Easy to merge, inserta and delete columns and rows using DataFrame
- Easy to read large datasets

NOTE: More features and ability of pandas we'll look in the upcoming modules.

## What we're going to look into pandas
- What is DatFrames?
- What is DataSeries?
- Different operations in Pandas?

## Installing pandas

In [1]:
!pip install pandas



## Importing pandas

In [2]:
import pandas as pd

<b>What is DataFrame: </b> DataFrame is a main object in Pandas. It is used to represent data with rows and columns(tabular or excel spreadsheet like data)

In [3]:
# Creating a DataFrame
import numpy as np
import pandas as pd

data = np.arange(0,20).reshape(5,4)

df = pd.DataFrame(data,
                 index=['r1','r2','r3','r4','r5'],
                 columns=['c1','c2','c3','c4'])

df

Unnamed: 0,c1,c2,c3,c4
r1,0,1,2,3
r2,4,5,6,7
r3,8,9,10,11
r4,12,13,14,15
r5,16,17,18,19


In [4]:
# Another example
df = pd.DataFrame({
    'Day':[1,2,3,4,5,6,7],
    'Day Name':['Mon','Tues','Wed','Thur','Fri','Sat','Sun'],
    'Weather': ['rain','rain','cloudy','windy','sunny','sunny','windy']
})

df

Unnamed: 0,Day,Day Name,Weather
0,1,Mon,rain
1,2,Tues,rain
2,3,Wed,cloudy
3,4,Thur,windy
4,5,Fri,sunny
5,6,Sat,sunny
6,7,Sun,windy


In [5]:
df.shape

(7, 3)

In [6]:
df.index

RangeIndex(start=0, stop=7, step=1)

Usually we work on large datasets. Therefore the DataFrame has lots of rows.

- head(): View only the start of DataFrame
- tail(): View only the end of DataFrame

In [7]:
df.head()

Unnamed: 0,Day,Day Name,Weather
0,1,Mon,rain
1,2,Tues,rain
2,3,Wed,cloudy
3,4,Thur,windy
4,5,Fri,sunny


In [8]:
df.tail(2)

Unnamed: 0,Day,Day Name,Weather
5,6,Sat,sunny
6,7,Sun,windy


In [9]:
df.columns

Index(['Day', 'Day Name', 'Weather'], dtype='object')

Acessing a specific columm

In [10]:
df.Day

0    1
1    2
2    3
3    4
4    5
5    6
6    7
Name: Day, dtype: int64

In [11]:
df['Weather']

0      rain
1      rain
2    cloudy
3     windy
4     sunny
5     sunny
6     windy
Name: Weather, dtype: object

In [12]:
# Acessing multiple columns
df[['Day Name','Weather']]

Unnamed: 0,Day Name,Weather
0,Mon,rain
1,Tues,rain
2,Wed,cloudy
3,Thur,windy
4,Fri,sunny
5,Sat,sunny
6,Sun,windy


Accessing columns using row and column values

In [13]:
df[0:]

Unnamed: 0,Day,Day Name,Weather
0,1,Mon,rain
1,2,Tues,rain
2,3,Wed,cloudy
3,4,Thur,windy
4,5,Fri,sunny
5,6,Sat,sunny
6,7,Sun,windy


In [14]:
df[0:4]

Unnamed: 0,Day,Day Name,Weather
0,1,Mon,rain
1,2,Tues,rain
2,3,Wed,cloudy
3,4,Thur,windy


In [15]:
df[2:5]

Unnamed: 0,Day,Day Name,Weather
2,3,Wed,cloudy
3,4,Thur,windy
4,5,Fri,sunny


### Types
- Series: Having only one data column
- DataFrame: Having multiple data columns

In [16]:
type(df['Day'])

pandas.core.series.Series

In [17]:
type(df[['Day','Weather']])

pandas.core.frame.DataFrame

### Adding columns in DataFrame

In [18]:
df['Temperature'] = [25,28,30,23,34,33,24]
df

Unnamed: 0,Day,Day Name,Weather,Temperature
0,1,Mon,rain,25
1,2,Tues,rain,28
2,3,Wed,cloudy,30
3,4,Thur,windy,23
4,5,Fri,sunny,34
5,6,Sat,sunny,33
6,7,Sun,windy,24


### Operations with pandas DataFrame

In [19]:
# Taking mean of temperature
df.Temperature.mean()

28.142857142857142

In [20]:
# You can access columns like this also
df['Temperature'].mean()

28.142857142857142

In [21]:
# Min value of temperature
df.Temperature.min()

23

In [22]:
# Max value of temperature
df.Temperature.max()

34

In [23]:
# Standard value of temperature
df.Temperature.std()

4.375255094603872

### Perform basic statistics operations using pandas function

```describe()```

In [24]:
df.describe()

Unnamed: 0,Day,Temperature
count,7.0,7.0
mean,4.0,28.142857
std,2.160247,4.375255
min,1.0,23.0
25%,2.5,24.5
50%,4.0,28.0
75%,5.5,31.5
max,7.0,34.0
