# Pandas Series

Note:

1.It is a open-source python library used for data manipulation and data analysis.

2.It contain two data structures called Series and DataFrames.

# Series

.Series is a one dimensional object Array or list or column [only one column].


In [2]:
#importing pandas library
import pandas as pd

### .Creating sample series

In [3]:
series1 = pd.Series([1,2,3,'string',4.3,-4])

In [4]:
series1

0         1
1         2
2         3
3    string
4       4.3
5        -4
dtype: object

In [5]:
type(series1)

pandas.core.series.Series

### .Adding Custom Index

In [8]:
index = ['a','b','c','d','e','f']
series2 = pd.Series([1,2,3,'string',4.3,-4],index = index)

In [9]:
series2

a         1
b         2
c         3
d    string
e       4.3
f        -4
dtype: object

### Accessing elements in Series

In [10]:
series2['a']

1

In [11]:
series2['d']

'string'

### Creating Series using Dictionaries

In [12]:
series3 = pd.Series({'London':20,'Tripoli':100,'Cairo':50})

In [13]:
series3

London      20
Tripoli    100
Cairo       50
dtype: int64

In [15]:
series3['Cairo']

50

### Performing Logical Operations

In [16]:
series3[series3>25]

Tripoli    100
Cairo       50
dtype: int64

# Pandas Dataframes 

Key Points

1.Dataframes are used when we have multiple columns or features Whereas Series are used when we have only one column.
    
    

In [18]:
#importing pandas library
import pandas as pd

### Creating Sample Dataframe

In [21]:
column1 = ['sai','ajay','ramu','somu']
column2 = [45,20,46,29]

df = pd.DataFrame({'Names': column1,'Marks': column2})
df

Unnamed: 0,Names,Marks
0,sai,45
1,ajay,20
2,ramu,46
3,somu,29


### Let's get first 2 rows

In [22]:
df.head(2)

Unnamed: 0,Names,Marks
0,sai,45
1,ajay,20


### Let's get last 2 rows

In [23]:
df.tail(2)

Unnamed: 0,Names,Marks
2,ramu,46
3,somu,29


### Printing all columns

In [24]:
df.columns

Index(['Names', 'Marks'], dtype='object')

In [25]:
#Let's change column Names
df.columns = ['Stud_Names','Stud_marks']

In [26]:
df

Unnamed: 0,Stud_Names,Stud_marks
0,sai,45
1,ajay,20
2,ramu,46
3,somu,29


### Let's change row names 

In [27]:
df.index = ['a','b','c','d']
df

Unnamed: 0,Stud_Names,Stud_marks
a,sai,45
b,ajay,20
c,ramu,46
d,somu,29


In [28]:
#Let's see how we get only one column (Stud_marks)
df['Stud_marks']

a    45
b    20
c    46
d    29
Name: Stud_marks, dtype: int64

In [29]:
#Let's see how we can delete single column (Stud_marks)
del df['Stud_marks']

In [31]:
column1 = ['sai','ajay','ramu','somu']
column2 = [45,20,46,29]
column3 = ['A','D','A','B']

df = pd.DataFrame({'Names': column1,'Marks': column2,'Grade': column3})
df

Unnamed: 0,Names,Marks,Grade
0,sai,45,A
1,ajay,20,D
2,ramu,46,A
3,somu,29,B


### Let's access rows from index 1 to index 3

.Here ending index is exclusive.

In [32]:
df.iloc[1:4]

Unnamed: 0,Names,Marks,Grade
1,ajay,20,D
2,ramu,46,A
3,somu,29,B


### Let's access rows from index 1 to index 3 but only Grade column which is index 2

In [34]:
df.iloc[1:4,2] #df[rows,columns]

1    D
2    A
3    B
Name: Grade, dtype: object

### Accessing all rows and columns 

In [35]:
df.iloc[:,:]

Unnamed: 0,Names,Marks,Grade
0,sai,45,A
1,ajay,20,D
2,ramu,46,A
3,somu,29,B


### Performing Mathematical Operations  

.Let's get all students who have got marks greater than 30

In [36]:
df[df['Marks']>30]

Unnamed: 0,Names,Marks,Grade
0,sai,45,A
2,ramu,46,A


.Let's get who got Grade A

In [38]:
df[df['Grade']=='A']

Unnamed: 0,Names,Marks,Grade
0,sai,45,A
2,ramu,46,A


.Let's access Multiple Columns

In [40]:
df[['Names','Grade']]

Unnamed: 0,Names,Grade
0,sai,A
1,ajay,D
2,ramu,A
3,somu,B


# Creating and Reading CSV files

In [41]:
#importing pandas library
import pandas as pd

In [42]:
#Creating Samole Dataframe
column1 = ['sai','ajay','ramu','somu']
column2 = [45,20,46,29]
column3 = ['A','D','A','B']

df = pd.DataFrame({'Names': column1,'Marks': column2,'Grade': column3})
df

Unnamed: 0,Names,Marks,Grade
0,sai,45,A
1,ajay,20,D
2,ramu,46,A
3,somu,29,B


### Converting this Dataframe to .csv() 

Format:

dataframe_name.to_csv('csv_filename')

In [44]:
df.to_csv('student_marks.csv',index = False) #here index=false means we don't want 0,1,2,3 in csv file

In [45]:
#checking all files present in my directory
import os
os.listdir()

['.bash_history',
 '.conda',
 '.condarc',
 '.continuum',
 '.git',
 '.gitconfig',
 '.ipynb_checkpoints',
 '.ipython',
 '.jupyter',
 '.keras',
 '.matplotlib',
 '.spyder-py3',
 '.vscode',
 '01_Bar_graph.ipynb',
 '02_Scatterplot.ipynb',
 '03_Line.ipynb',
 '04_Histogram.ipynb',
 '05_Box_plot.ipynb',
 '06_Subplots.ipynb',
 '1st program.ipynb',
 '30 days python.ipynb',
 '3D Objects',
 'Additional Content(Conversion ,Input).ipynb',
 'Additional content.ipynb',
 'airlines_final.csv',
 'All Patterns.ipynb',
 'anaconda3',
 'AppData',
 'Application Data',
 'Arrays.ipynb',
 'assignment 3.ipynb',
 'assignment 4.ipynb',
 'Basics_of_python (1).ipynb',
 'Call by value and call by reference.ipynb',
 'Comprehensions(LIST,SETS,MAPS).ipynb',
 'Contacts',
 'Cookies',
 'Data Structure (Dictionary).ipynb',
 'Data Structure (LIST).ipynb',
 'Data Structure (Tuple).ipynb',
 'Data Structures (Sets).ipynb',
 'Data_structures_in_Python_questions (1).ipynb',
 'Desktop',
 'Documents',
 'Downloads',
 'Favorites',
 'Fo

### Reading CSV files 

In [46]:
df1 = pd.read_csv('student_marks.csv')
df1

Unnamed: 0,Names,Marks,Grade
0,sai,45,A
1,ajay,20,D
2,ramu,46,A
3,somu,29,B


# Pandas GroupBy

### 1. Pandas dataframe.groupby() is used to split the data into groups based on some criteria.

### 2. Pndas groupby is used for grouping the data according to the categories and apply a function to categories like [Average,Sum,Max,Min]

In [47]:
#importing pandas library
import pandas as pd

In [49]:
#Creating a Dataframe
#here we have a dataframe consists of student's names and marks scored by student's in different subjects.
column1 = ["Yeswanth",'Ajay','Kiran',"Ajay","Yeswanth","Kiran","Yeswanth","Ajay","Kiran"]
column2 = [45,29,49,39,22,30,35,41,26]
column3 = ['Maths','Physics','Social','Maths','Social','Maths','Physics','Social','Physics']

df = pd.DataFrame({'Names': column1,'Marks': column2,'Subject': column3})
df

Unnamed: 0,Names,Marks,Subject
0,Yeswanth,45,Maths
1,Ajay,29,Physics
2,Kiran,49,Social
3,Ajay,39,Maths
4,Yeswanth,22,Social
5,Kiran,30,Maths
6,Yeswanth,35,Physics
7,Ajay,41,Social
8,Kiran,26,Physics


In [50]:
#Here we have 3 students mainly like Yeswanth,Ajay,Kiran and scored in different subjects like Maths,Physics, Social. 

### Let's see who have scored top marks among the all 

In [51]:
df.groupby('Names').agg('sum')

Unnamed: 0_level_0,Marks
Names,Unnamed: 1_level_1
Ajay,109
Kiran,105
Yeswanth,102


In [52]:
#here we see Ajay scored more than kiran and yeswanth

### Let's get Mean Scores  

In [53]:
df.groupby('Names').agg('mean')

Unnamed: 0_level_0,Marks
Names,Unnamed: 1_level_1
Ajay,36.333333
Kiran,35.0
Yeswanth,34.0


In [54]:
#Mean score of ajay is more than others

### Let's see  in which subjects these students got more marks  

In [55]:
df.groupby('Names').agg('max')

Unnamed: 0_level_0,Marks,Subject
Names,Unnamed: 1_level_1,Unnamed: 2_level_1
Ajay,41,Social
Kiran,49,Social
Yeswanth,45,Social
