# pandas

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.

---

## Import 

In [1]:
import pandas as pd

---

## pandas series 

In [2]:
values = [1, 2, 3, 4, 5]

In [3]:
pd.Series(values)

0    1
1    2
2    3
3    4
4    5
dtype: int64

---

## pandas Dataframes

When working with tabular data, such as data stored in spreadsheets or databases, Pandas is the right tool for you. Pandas will help you to explore, clean and process your data. In Pandas, a data table is called a DataFrame.

![DataFrame](https://pandas.pydata.org/docs/_images/01_table_dataframe.svg)

In [4]:
from numpy.random import randn

In [5]:
df = pd.DataFrame(randn(6,4),[0, 1, 2, 3, 4, 5],['A', 'B', 'C', 'D'])

In [6]:
df

Unnamed: 0,A,B,C,D
0,0.789602,-1.611856,0.531649,-1.244406
1,0.58135,-0.582899,-0.452991,-1.271945
2,1.793954,-0.450444,0.304306,1.29953
3,-0.871678,-0.047553,-0.011403,-0.649625
4,0.34338,-0.351982,-0.149634,1.074231
5,2.446575,2.225359,0.312606,1.452146


In [7]:
type(df)

pandas.core.frame.DataFrame

In [8]:
df['A']

0    0.789602
1    0.581350
2    1.793954
3   -0.871678
4    0.343380
5    2.446575
Name: A, dtype: float64

In [9]:
df['E'] = df['A'] + df['B']

In [10]:
df

Unnamed: 0,A,B,C,D,E
0,0.789602,-1.611856,0.531649,-1.244406,-0.822254
1,0.58135,-0.582899,-0.452991,-1.271945,-0.001549
2,1.793954,-0.450444,0.304306,1.29953,1.34351
3,-0.871678,-0.047553,-0.011403,-0.649625,-0.919232
4,0.34338,-0.351982,-0.149634,1.074231,-0.008602
5,2.446575,2.225359,0.312606,1.452146,4.671934


In [11]:
df.loc[[1,3],['A','C']]

Unnamed: 0,A,C
1,0.58135,-0.452991
3,-0.871678,-0.011403


---

## Read a CSV file

In [12]:
data = pd.read_csv(r'C:\Github\100-Days-of-ML-Code\Datasets\vgsales.csv')

In [13]:
type(data)

pandas.core.frame.DataFrame

In [14]:
data.head(5)

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


In [15]:
data.tail(5)

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.0,0.0,0.0,0.01
16594,16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.0,0.0,0.0,0.01
16595,16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.0,0.0,0.0,0.0,0.01
16596,16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.0,0.01,0.0,0.0,0.01
16597,16600,Spirits & Spells,GBA,2003.0,Platform,Wanadoo,0.01,0.0,0.0,0.0,0.01


In [16]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
Rank            16598 non-null int64
Name            16598 non-null object
Platform        16598 non-null object
Year            16327 non-null float64
Genre           16598 non-null object
Publisher       16540 non-null object
NA_Sales        16598 non-null float64
EU_Sales        16598 non-null float64
JP_Sales        16598 non-null float64
Other_Sales     16598 non-null float64
Global_Sales    16598 non-null float64
dtypes: float64(6), int64(1), object(4)
memory usage: 1.4+ MB


In [17]:
data.describe()

Unnamed: 0,Rank,Year,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
count,16598.0,16327.0,16598.0,16598.0,16598.0,16598.0,16598.0
mean,8300.605254,2006.406443,0.264667,0.146652,0.077782,0.048063,0.537441
std,4791.853933,5.828981,0.816683,0.505351,0.309291,0.188588,1.555028
min,1.0,1980.0,0.0,0.0,0.0,0.0,0.01
25%,4151.25,2003.0,0.0,0.0,0.0,0.0,0.06
50%,8300.5,2007.0,0.08,0.02,0.0,0.01,0.17
75%,12449.75,2010.0,0.24,0.11,0.04,0.04,0.47
max,16600.0,2020.0,41.49,29.02,10.22,10.57,82.74
