# Pandas tutorial

Pandas is an open-source, BSD-licensed Python library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Python with Pandas is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, etc. In this tutorial, we will learn the various features of Python Pandas and how to use them in practice.

Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, prepare, manipulate, model, and analyze.

Benifits:-
1) Flexibility of Python
2) Big DataSets in Use

#### Find the datset in the repository by name pokemon.csv

In [1]:
#loding data into editor

In [6]:
import pandas as pd
df = pd.read_csv('pokemon_data.csv')

print(df)

       #                   Name   Type 1  Type 2  HP  Attack  Defense  \
0      1              Bulbasaur    Grass  Poison  45      49       49   
1      2                Ivysaur    Grass  Poison  60      62       63   
2      3               Venusaur    Grass  Poison  80      82       83   
3      3  VenusaurMega Venusaur    Grass  Poison  80     100      123   
4      4             Charmander     Fire     NaN  39      52       43   
..   ...                    ...      ...     ...  ..     ...      ...   
795  719                Diancie     Rock   Fairy  50     100      150   
796  719    DiancieMega Diancie     Rock   Fairy  50     160      110   
797  720    HoopaHoopa Confined  Psychic   Ghost  80     110       60   
798  720     HoopaHoopa Unbound  Psychic    Dark  80     160       60   
799  721              Volcanion     Fire   Water  80     110      120   

     Sp. Atk  Sp. Def  Speed  Generation  Legendary  
0         65       65     45           1      False  
1         80   

In [6]:
#To specify the data from top and bottom use 
#you can mention the number of rows 
import pandas as pd
df = pd.read_csv('pokemon_data.csv')
print(df.head(3))
#Slly you can use top but u need to write the three lines agin

   #       Name Type 1  Type 2  HP  Attack  Defense  Sp. Atk  Sp. Def  Speed  \
0  1  Bulbasaur  Grass  Poison  45      49       49       65       65     45   
1  2    Ivysaur  Grass  Poison  60      62       63       80       80     60   
2  3   Venusaur  Grass  Poison  80      82       83      100      100     80   

   Generation  Legendary  
0           1      False  
1           1      False  
2           1      False  


In [7]:
#You can also export and read different files like excel file by simply using this

#import pandas as pd
#df_xlsx=pd.read_excel('pokemon_data.xls')
#print (df_xlsx.head((3))

In [8]:
#For knowing the attributes of the dataset 
print(df.columns)

Index(['#', 'Name', 'Type 1', 'Type 2', 'HP', 'Attack', 'Defense', 'Sp. Atk',
       'Sp. Def', 'Speed', 'Generation', 'Legendary'],
      dtype='object')


The conventional use of Pandas is for analyzing and manipulating data but not limited to the same. Pandas’ basic data structure includes series and Dataframe. Series is a one-dimensional array comprising of data items of any data type.

Pandas Dataframe is a two-dimensional array consisting of data items of any data type. Pandas can also be identified as a combination of two or more Pandas Series objects

In [17]:
#Read each column with specified intex
print(df['Name'][0:5])

0                Bulbasaur
1                  Ivysaur
2                 Venusaur
3    VenusaurMega Venusaur
4               Charmander
Name: Name, dtype: object


## Sorting in Pandas

In [9]:
import pandas as pd
df = pd.read_csv('pokemon_data.csv')
df.sort_values(['Type 1', 'HP'], ascending=[1,0])

print(df)

       #                   Name   Type 1  Type 2  HP  Attack  Defense  \
0      1              Bulbasaur    Grass  Poison  45      49       49   
1      2                Ivysaur    Grass  Poison  60      62       63   
2      3               Venusaur    Grass  Poison  80      82       83   
3      3  VenusaurMega Venusaur    Grass  Poison  80     100      123   
4      4             Charmander     Fire     NaN  39      52       43   
..   ...                    ...      ...     ...  ..     ...      ...   
795  719                Diancie     Rock   Fairy  50     100      150   
796  719    DiancieMega Diancie     Rock   Fairy  50     160      110   
797  720    HoopaHoopa Confined  Psychic   Ghost  80     110       60   
798  720     HoopaHoopa Unbound  Psychic    Dark  80     160       60   
799  721              Volcanion     Fire   Water  80     110      120   

     Sp. Atk  Sp. Def  Speed  Generation  Legendary  
0         65       65     45           1      False  
1         80   

## Making changes to the data

In [11]:
#df['Total'] = df['HP'] + df['Attack'] + df['Defense'] + df['Sp. Atk'] + df['Sp. Def'] + df['Speed']

# df = df.drop(columns=['Total'])

df['Total'] = df.iloc[:, 4:10].sum(axis=1)

cols = list(df.columns)
df = df[cols[0:4] + [cols[-1]]+cols[4:12]]

df.head(5)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


## Saving and exporting our data

In [14]:
# df.to_csv('modified.csv', index=False)

#df.to_excel('modified.xlsx', index=False)

df.to_csv('modified.txt', index=False, sep='\t')

## Filtering the Data

In [17]:
new_df = df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison') & (df['HP'] > 70)]

new_df.reset_index(drop=True, inplace=True)

print(new_df)

new_df.to_csv('filtered.csv')

     #                   Name Type 1  Type 2  Total   HP  Attack  Defense  \
0    3               Venusaur  Grass  Poison    525   80      82       83   
1    3  VenusaurMega Venusaur  Grass  Poison    625   80     100      123   
2   45              Vileplume  Grass  Poison    490   75      80       85   
3   71             Victreebel  Grass  Poison    490   80     105       65   
4  591              Amoonguss  Grass  Poison    464  114      85       70   

   Sp. Atk  Sp. Def  Speed  Generation  Legendary  
0      100      100     80           1      False  
1      122      120     80           1      False  
2      110       90     50           1      False  
3      100       70     70           1      False  
4       85       80     30           5      False  


## Agregate statistics (Grouping)

In [21]:
import pandas as pd
df = pd.read_csv('pokemon_data.csv')

df['count'] = 1

df.groupby(['Type 1', 'Type 2']).count()['count']

Type 1  Type 2  
Bug     Electric     2
        Fighting     2
        Fire         2
        Flying      14
        Ghost        1
                    ..
Water   Ice          3
        Poison       3
        Psychic      5
        Rock         4
        Steel        1
Name: count, Length: 136, dtype: int64

## Working with large amounts of data

In [24]:
new_df = pd.DataFrame(columns=df.columns)

for df in pd.read_csv('pokemon_data.csv', chunksize=5):
    results = df.groupby(['Type 1']).count()
    
    new_df = pd.concat([new_df, results])