<center><h1>Pandas</h1></center>

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.


- #### Majour Application areas : Data Manipulation and Data Visualisation.
- #### Pandas Data Structures : Data Frames and Series

> Document Link: https://pandas.pydata.org/docs/user_guide/index.html

In [1]:
# for installing Pandas library
!pip install pandas



In [3]:
import pandas as pd

### Series

In [4]:
# creating series from tuple:
character_series = pd.Series(('L','u','t','h','o','r'))
print('series: ',character_series,sep='\n')
print('type: ',type(character_series))

series: 
0    L
1    u
2    t
3    h
4    o
5    r
dtype: object
type:  <class 'pandas.core.series.Series'>


In [5]:
# creating series from list:
integer_series = pd.Series([11,12,13,14,15,16])
print('series: ',integer_series,sep='\n')
print('type: ',type(integer_series))

series: 
0    11
1    12
2    13
3    14
4    15
5    16
dtype: int64
type:  <class 'pandas.core.series.Series'>


In [6]:
# creating a date series: 
# default date format : mm-dd-yyyy 

date_series = pd.date_range(start = '01-01-2020', end= '01-04-2020')
print('Date range: ',date_series, sep='\n', end='\n\n')
print('type :', type(date_series))

Date range: 
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04'], dtype='datetime64[ns]', freq='D')

type : <class 'pandas.core.indexes.datetimes.DatetimeIndex'>


In [12]:
# what all can you do with the date series:
print('all the years: ',date_series.year,sep='\n',end='\n\n')
print('all the months: ',date_series.month,sep='\n',end='\n\n')
print('all the month names: ',date_series.month_name(),sep='\n',end='\n\n')
print('all the days: ',date_series.day,sep='\n',end='\n\n')
print('all the day names: ',date_series.day_name(),sep='\n',end='\n\n')
print('which day of the year are they: ',date_series.day_of_year,sep='\n',end='\n\n')

all the years: 
Int64Index([2020, 2020, 2020, 2020], dtype='int64')

all the months: 
Int64Index([1, 1, 1, 1], dtype='int64')

all the month names: 
Index(['January', 'January', 'January', 'January'], dtype='object')

all the days: 
Int64Index([1, 2, 3, 4], dtype='int64')

all the day names: 
Index(['Wednesday', 'Thursday', 'Friday', 'Saturday'], dtype='object')

which day of the year are they: 
Int64Index([1, 2, 3, 4], dtype='int64')



In [14]:
# indexing:
integer_series = pd.Series([11,12,13,14,15,16])
print('0th element in the series: ',integer_series[0],end ='\n\n')
print('1st 4 elements in the series: ',integer_series[0:4], sep='\n',end ='\n\n')
print('1st and 4th element in the series: ',integer_series[[0,3]], sep='\n')

0th element in the series:  11

1st 4 elements in the series: 
0    11
1    12
2    13
3    14
dtype: int64

1st and 4th element in the series: 
0    11
3    14
dtype: int64


In [19]:
# renaming the indexes
integer_series = pd.Series([11,12,13,14,15,16], index=[101, 102, 103, 104, 105, 106])
print('new index series:',integer_series,sep='\n',end='\n\n')
print('1st element :',integer_series[103])

new index series:
101    11
102    12
103    13
104    14
105    15
106    16
dtype: int64

1st element : 13


In [21]:
# simpler way to achieve the same
integer_series = pd.Series([11,12,13,14,15,16], index=range(101,107))
integer_series

101    11
102    12
103    13
104    14
105    15
106    16
dtype: int64

### Data Frames

In [22]:
# creating a dataframe from a dictionary

student_df = pd.DataFrame({'Name': ['student_1','student_2', 'student_3', 'student_4', 'student_5'],
              'Roll_number': [1, 5, 8, 10, 3],
             'Math_marks' : [90, 89, 78, 98, 68]})
student_df

Unnamed: 0,Name,Roll_number,Math_marks
0,student_1,1,90
1,student_2,5,89
2,student_3,8,78
3,student_4,10,98
4,student_5,3,68


In [23]:
# reindexing according to roll number
Roll_number = [1, 5, 8, 10, 3]
student_df = pd.DataFrame({'Name': ['student_1','student_2', 'student_3', 'student_4', 'student_5'],
                           'Math_marks' : [90, 89, 78, 98, 68]}, index=Roll_number)
student_df

Unnamed: 0,Name,Math_marks
1,student_1,90
5,student_2,89
8,student_3,78
10,student_4,98
3,student_5,68


In [30]:
# reading a csv file into a dataframe
# you can use similar functions for other structured data files like excel.
# important

dummy_df = pd.read_csv(r'C:\Users\ASUS\Desktop\dummy_data.csv')
# dummy_df = pd.read_csv('C:\\Users\\ASUS\\Desktop\\dummy_data.csv')
# dummy_df = pd.read_csv("C:/Users/ASUS/Desktop/dummy_data.csv")
dummy_df = dummy_df.set_index('id')
dummy_df

Unnamed: 0_level_0,first_name,last_name,gender
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Dodi,MacCurley,Male
2,Pooh,Casado,Genderqueer
3,Krispin,Govinlock,Agender
4,Tiphany,Dabney,Bigender
5,Derry,Fehner,Genderfluid
6,Heriberto,Behninck,Bigender
7,Michal,Gath,Female
8,Stella,Shadwick,Genderfluid
9,Consuelo,Asty,Polygender
10,Amabel,Mortimer,Male


### Functions

In [37]:
# just view the first few records
dummy_df.head(6)

Unnamed: 0_level_0,first_name,last_name,gender
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Dodi,MacCurley,Male
2,Pooh,Casado,Genderqueer
3,Krispin,Govinlock,Agender
4,Tiphany,Dabney,Bigender
5,Derry,Fehner,Genderfluid
6,Heriberto,Behninck,Bigender


In [36]:
# just view the last few records
dummy_df.tail(6)

Unnamed: 0_level_0,first_name,last_name,gender
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
5,Derry,Fehner,Genderfluid
6,Heriberto,Behninck,Bigender
7,Michal,Gath,Female
8,Stella,Shadwick,Genderfluid
9,Consuelo,Asty,Polygender
10,Amabel,Mortimer,Male


In [38]:
# information about the dataframe
dummy_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 1 to 10
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   first_name  10 non-null     object
 1   last_name   10 non-null     object
 2   gender      10 non-null     object
dtypes: object(3)
memory usage: 320.0+ bytes


In [39]:
# shape of the data
dummy_df.shape

(10, 3)

In [40]:
# distribution and other related values
dummy_df.describe()

Unnamed: 0,first_name,last_name,gender
count,10,10,10
unique,10,10,7
top,Dodi,Casado,Male
freq,1,1,2


In [44]:
# name of the columns present
dummy_df.columns

Index(['first_name', 'last_name', 'gender'], dtype='object')

In [50]:
# array of all the rows
print('all rows: ',dummy_df.values, sep='\n', end ='\n\n')
print('row in the 0th index: ',dummy_df.values[0])

all rows: 
[['Dodi' 'MacCurley' 'Male']
 ['Pooh' 'Casado' 'Genderqueer']
 ['Krispin' 'Govinlock' 'Agender']
 ['Tiphany' 'Dabney' 'Bigender']
 ['Derry' 'Fehner' 'Genderfluid']
 ['Heriberto' 'Behninck' 'Bigender']
 ['Michal' 'Gath' 'Female']
 ['Stella' 'Shadwick' 'Genderfluid']
 ['Consuelo' 'Asty' 'Polygender']
 ['Amabel' 'Mortimer' 'Male']]

row in the 0th index:  ['Dodi' 'MacCurley' 'Male']


In [51]:
# count the number of non-null values in each row or column
dummy_df.count()

first_name    10
last_name     10
gender        10
dtype: int64

In [52]:
# gives you a count of all the unique values present in the column
dummy_df['gender'].value_counts()

Male           2
Bigender       2
Genderfluid    2
Polygender     1
Agender        1
Female         1
Genderqueer    1
Name: gender, dtype: int64

In [57]:
# selecting rows
# you can't associate a single value and get the same output

dummy_df[0:1]

Unnamed: 0_level_0,first_name,last_name,gender
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Dodi,MacCurley,Male


In [58]:
dummy_df[1:8:2]

Unnamed: 0_level_0,first_name,last_name,gender
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2,Pooh,Casado,Genderqueer
4,Tiphany,Dabney,Bigender
6,Heriberto,Behninck,Bigender
8,Stella,Shadwick,Genderfluid


In [61]:
# seleting the column
# 1. First method  [preferred]

print('Series output:',dummy_df['first_name'],sep ='\n',end='\n\n')
print(type(dummy_df['gender']))

Series output:
id
1          Dodi
2          Pooh
3       Krispin
4       Tiphany
5         Derry
6     Heriberto
7        Michal
8        Stella
9      Consuelo
10       Amabel
Name: first_name, dtype: object

<class 'pandas.core.series.Series'>


In [60]:
# seleting the column
# 2. Second method

print('Series output:',dummy_df.gender,sep ='\n',end='\n\n')
print(type(dummy_df.gender))

Series output:
id
1            Male
2     Genderqueer
3         Agender
4        Bigender
5     Genderfluid
6        Bigender
7          Female
8     Genderfluid
9      Polygender
10           Male
Name: gender, dtype: object

<class 'pandas.core.series.Series'>


In [62]:
# selecting more than one column (subsetting the df)

print('Data Frame output:',dummy_df[['gender','first_name']],
      sep ='\n',end='\n\n')

print(type(dummy_df[['gender','first_name']]))

Data Frame output:
         gender first_name
id                        
1          Male       Dodi
2   Genderqueer       Pooh
3       Agender    Krispin
4      Bigender    Tiphany
5   Genderfluid      Derry
6      Bigender  Heriberto
7        Female     Michal
8   Genderfluid     Stella
9    Polygender   Consuelo
10         Male     Amabel

<class 'pandas.core.frame.DataFrame'>


###### indexing:
- label based indexing  using <code>df.loc</code>
- postion based indexing using <code>df.iloc</code>

In [63]:
dummy_df

Unnamed: 0_level_0,first_name,last_name,gender
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Dodi,MacCurley,Male
2,Pooh,Casado,Genderqueer
3,Krispin,Govinlock,Agender
4,Tiphany,Dabney,Bigender
5,Derry,Fehner,Genderfluid
6,Heriberto,Behninck,Bigender
7,Michal,Gath,Female
8,Stella,Shadwick,Genderfluid
9,Consuelo,Asty,Polygender
10,Amabel,Mortimer,Male


In [64]:
# get the 10th value of the gender column:
dummy_df.loc[10,'gender']

'Male'

In [65]:
# printing the 10th row values
print(dummy_df.loc[10],end='\n\n')
print(type(dummy_df.loc[10]))

first_name      Amabel
last_name     Mortimer
gender            Male
Name: 10, dtype: object

<class 'pandas.core.series.Series'>


In [67]:
# selecting a single element:
# 0th row and 1st column
dummy_df.iloc[0,1]

'MacCurley'

In [70]:
# printing the 10th row values [index of the 10th row -> 9]
print(dummy_df.iloc[9])
type(dummy_df.iloc[9])

first_name      Amabel
last_name     Mortimer
gender            Male
Name: 10, dtype: object


pandas.core.series.Series

In [71]:
dummy_df

Unnamed: 0_level_0,first_name,last_name,gender
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Dodi,MacCurley,Male
2,Pooh,Casado,Genderqueer
3,Krispin,Govinlock,Agender
4,Tiphany,Dabney,Bigender
5,Derry,Fehner,Genderfluid
6,Heriberto,Behninck,Bigender
7,Michal,Gath,Female
8,Stella,Shadwick,Genderfluid
9,Consuelo,Asty,Polygender
10,Amabel,Mortimer,Male


In [72]:
# conditonal statemetns
dummy_df[dummy_df.gender == 'Male']

Unnamed: 0_level_0,first_name,last_name,gender
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Dodi,MacCurley,Male
10,Amabel,Mortimer,Male


In [73]:
dummy_df[dummy_df.index >3]

Unnamed: 0_level_0,first_name,last_name,gender
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
4,Tiphany,Dabney,Bigender
5,Derry,Fehner,Genderfluid
6,Heriberto,Behninck,Bigender
7,Michal,Gath,Female
8,Stella,Shadwick,Genderfluid
9,Consuelo,Asty,Polygender
10,Amabel,Mortimer,Male


In [None]:
# your code goes here


In [None]:
# your code goes here
