# Pandas

#### Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive.

In [1]:
import pandas as pd

# DataStructures¶

#### Pandas introduces three new data structures to Python [ both of which are built on top of NumPy (this means it’s fast).]

1)Series 2)DataFrame 3)Panel

# DataFrame

#### DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.It is similar to a spreadsheet or an SQL table.

In [4]:
# Creating an empty dataframe
df=pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []


In [5]:
# Reading from CSV File
df=pd.read_csv('StudentInfo.csv')
print(df)

   Sl.No    Name         USN   Mobile No                Email Id
0      1  Jeevan  4MW15CS400  9892323231   jeevan.cs@sode-edu.in
1      2  Sameer  4MW15CS401  9987453621   sameer.cs@sode-edu.in
2      3  Jeevan  4MW15CS402  9894563231  jeevan1.cs@sode-edu.in


In [6]:
print(df['Name'])

0    Jeevan
1    Sameer
2    Jeevan
Name: Name, dtype: object


In [7]:
df.groupby('Name').groups

{'Jeevan': Int64Index([0, 2], dtype='int64'),
 'Sameer': Int64Index([1], dtype='int64')}

In [8]:
# Returns the index (row labels) of the DataFrame.
len(df.index)

3

# Group-By: Split-Apply-Combine

.Splitting the data into groups based on some criteria
.Applying a function to each group independently
.Combining the results into a data structure

#### groupby() splits the data into groups based on some criteria

In [9]:
# groupby
df=pd.read_csv('CityStatistics.csv')
df_split=df.groupby('City')
for nam,grp in df_split:
    print('Name of the group:',nam)
    print('Group:',grp)

Name of the group: Bengaluru
Group:         City  Population  Year
2  Bengaluru          17  2010
6  Bengaluru          19  2011
Name of the group: Delhi
Group:     City  Population  Year
1  Delhi          25  2010
3  Delhi          30  2011
Name of the group: Mumbai
Group:      City  Population  Year
0  Mumbai          15  2010
4  Mumbai          18  2012
5  Mumbai          20  2013


#### agg() method will apply the list of functions to each group independently

In [10]:
df_split.agg({'Population':['mean','std']})

Unnamed: 0_level_0,Population,Population
Unnamed: 0_level_1,mean,std
City,Unnamed: 1_level_2,Unnamed: 2_level_2
Bengaluru,18.0,1.414214
Delhi,27.5,3.535534
Mumbai,17.666667,2.516611


In [11]:
df = pd.DataFrame([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]],
                    columns=['A', 'B', 'C'])
df.agg(['sum', 'min'])

Unnamed: 0,A,B,C
sum,12,15,18
min,1,2,3


#### apply() -Arbitrary functions can be applied along the axes of a DataFrame or Panel using the apply() method, which, like the descriptive statistics methods, takes an optional axis argument. By default, the operation performs column wise, taking each column as an array-like.

In [2]:
df = pd.DataFrame([[4, 9],] * 3, columns=['A', 'B'])
print(df)

   A  B
0  4  9
1  4  9
2  4  9


In [3]:
import numpy as np
df.apply(np.sqrt)

Unnamed: 0,A,B
0,2.0,3.0
1,2.0,3.0
2,2.0,3.0


In [4]:
# Calculate the sum columnwise 
df.apply(np.sum, axis=0)

A    12
B    27
dtype: int64

In [5]:
# Calculate the sum rowise
df.apply(np.sum,axis=1)

0    13
1    13
2    13
dtype: int64

# Counter Function in Collections

In [6]:
from collections import Counter
 # Create a list
z = ['blue', 'red', 'blue', 'yellow', 'blue', 'red']
col_count = Counter(z)
print(col_count)

Counter({'blue': 3, 'red': 2, 'yellow': 1})


# pprint()

In [6]:
import pprint
data = {'a':2, 'b':{'x':3, 'y':{'t1': 4, 't2':5}}}
print(data)
pprint.pprint(data)

{'a': 2, 'b': {'x': 3, 'y': {'t1': 4, 't2': 5}}}
{'a': 2, 'b': {'x': 3, 'y': {'t1': 4, 't2': 5}}}


In [7]:
data = { 'key1' : 'value1',
                     'key2' : 'value2',
                     'key3' : { 'key3a': 'value3a' },
                     'key4' : { 'key4a': { 'key4aa': 'value4aa',
                                           'key4ab': 'value4ab',
                                           'key4ac': 'value4ac'},
                                'key4b': 'value4b'}
                   }
print("USING PRINT:\n",data)
print("\n\n USING PPRINT:\n")
pprint.pprint(data)

USING PRINT:
 {'key1': 'value1', 'key2': 'value2', 'key3': {'key3a': 'value3a'}, 'key4': {'key4a': {'key4aa': 'value4aa', 'key4ab': 'value4ab', 'key4ac': 'value4ac'}, 'key4b': 'value4b'}}


 USING PPRINT:

{'key1': 'value1',
 'key2': 'value2',
 'key3': {'key3a': 'value3a'},
 'key4': {'key4a': {'key4aa': 'value4aa',
                    'key4ab': 'value4ab',
                    'key4ac': 'value4ac'},
          'key4b': 'value4b'}}
