# Pandas Tutorial

### Pandas Introduction

Pandas is a Python library used for working with data sets.
It is used for data manipulation, analysis and cleaning.
Pandas allows us to analyze big data. Pandas can clean messy data sets, and make them readable and relevant.

# Pandas Series

A Pandas Series is like a column in a table.
It is a one-dimensional array holding data of any type (generally of same type, either all ints, floats etc).
Mixed data type will give object data type

#### 1. Program to create simple series

In [1]:
import pandas as pd

In [2]:
a = [10,20,30]
s = pd.Series(a)
print(s)
print(type(s))

0    10
1    20
2    30
dtype: int64
<class 'pandas.core.series.Series'>


## Labels
If noting else is specified in the definition then, the values are labeled with their index number from 0 and so on.
This label can be used to access a specified value.

With the index argument, we can name our own labels. No. of labels must be equal to number of elements in array

In [3]:
a1 = [10,20,30]
s1 = pd.Series(a1,index=['row1','row2','row3'])
print(s1)

row1    10
row2    20
row3    30
dtype: int64


We can access the elemets within the series using this Labels

In [4]:
print(s1['row2'])

20


#### 2. Program to create series using key/value structure

We can also use key/value object, like dictionary to create series, To select only some of the items in the dictionary, we use the index argument and specify only the items you want to include in the Series.

In [5]:
calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories)
print(myvar)

day1    420
day2    380
day3    390
dtype: int64


In [6]:
# by passing only required index arguments
c = {"day1": 420, "day2": 380, "day3": 390}

myvar1 = pd.Series(c, index=['day1','day3'])
print(myvar1)

day1    420
day3    390
dtype: int64


# Pandas DataFrames

Data sets in Pandas are usually 2D or multi-dimensional tables, called DataFrames.

Series is like a column, a DataFrame is the whole table.

#### 3. Program to create DataFrame of students roll number, name and marks

In [7]:
data={
    'roll':[101,102,103],
    'name':['abhishek','amit','jack'],
    'marks':[85,78,96]
}
table = pd.DataFrame(data)
print(table)
type(table)

   roll      name  marks
0   101  abhishek     85
1   102      amit     78
2   103      jack     96


pandas.core.frame.DataFrame

## Locate Rows

Using loc attribute we can return one or more specified rows using indexes, if labels are given then we can return the values using that label

In [8]:
data1={
    'roll':[101,102,103],
    'name':['abhishek','amit','jack'],
    'marks':[85,78,96]
}
table1 = pd.DataFrame(data1, index=['R1','R2','R3'])
print(table1)
print('\n')
print(table1.loc['R2'])
#above return the Serie

print('\n')
print(table1.loc[['R1','R2']]) #need to provide list of multiple labels to grab multiple rows
#above returns the DataFrame


    roll      name  marks
R1   101  abhishek     85
R2   102      amit     78
R3   103      jack     96


roll      102
name     amit
marks      78
Name: R2, dtype: object


    roll      name  marks
R1   101  abhishek     85
R2   102      amit     78


## Reading CSV

In [9]:
a = pd.read_csv('Pandas Tutorial CSV.csv')

In [10]:
a #to_string() is used to display all data for large data in csv

Unnamed: 0,Duration,Pulse,Maxpulse,Calories
0,60,110,130,409.1
1,60,117,145,479.0
2,60,103,135,340.0
3,45,109,175,282.4
4,45,117,148,406.0
...,...,...,...,...
164,60,105,140,290.8
165,60,110,145,300.0
166,60,115,145,310.2
167,75,120,150,320.4


## Creating CSV

In [24]:
data1={
    'roll':[101,102,103,104],
    'name':['abhishek','amit','jack','mach'],
    'marks':[85,78,96,56]
}

df = pd.DataFrame(data1)
df.to_csv('Create Pandas CSV.csv', index = False) #to_csv available with object and not with pd.

#index = False will omit all index value while writing to file

#### To get the data from DataFrame

In [25]:
# to print first n rows

df.head(2)

Unnamed: 0,roll,name,marks
0,101,abhishek,85
1,102,amit,78


In [26]:
# to get last n rows

df.tail(3)

Unnamed: 0,roll,name,marks
1,102,amit,78
2,103,jack,96
3,104,mach,56


In [27]:
# to get all possible data from numeric column (that contains int,float values)

df.describe()

Unnamed: 0,roll,marks
count,4.0,4.0
mean,102.5,78.75
std,1.290994,16.879475
min,101.0,56.0
25%,101.75,72.5
50%,102.5,81.5
75%,103.25,87.75
max,104.0,96.0


### To update the row value

In [30]:
# to change the value of marks from first row
df['marks'][0] = 77  #not recommended

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [31]:
df

Unnamed: 0,roll,name,marks
0,101,abhishek,77
1,102,amit,78
2,103,jack,96
3,104,mach,56


In [32]:
df.index

RangeIndex(start=0, stop=4, step=1)

In [33]:
df.columns

Index(['roll', 'name', 'marks'], dtype='object')