# Intro to Pandas

[Data Science Handbook (with notebooks!)](https://jakevdp.github.io/PythonDataScienceHandbook/)

[Basics of Pandas](https://towardsdatascience.com/6-basic-pandas-techniques-you-need-to-know-2c5725746938)

[Pandas cheat sheet](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwjajKXO09DlAhWKqIsKHRK1Ab4QFjAAegQIARAC&url=https%3A%2F%2Fpandas.pydata.org%2FPandas_Cheat_Sheet.pdf&usg=AOvVaw2Z0H-ttrFe-41ta-Cnkf55)

[Good about rows and columns](https://www.geeksforgeeks.org/dealing-with-rows-and-columns-in-pandas-dataframe/)

Pandas is a python library for data science, data manipulation and data analysis. A Pandas *DataFrame* is a table with rows and columns. There is typically one data point per row and several features (columns) for each data point.

In [1]:
import pandas as pd
from sklearn import datasets

## Converting from format X to DataFrame

In [2]:
#List of numbers to DataFrame:

num_list = [1,2,3,4,5]
df = pd.DataFrame(num_list)
df

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5


In [3]:
#List of tuples to DataFrame:

num_list = [(1,2),(3,4),(5,3)]
df = pd.DataFrame(num_list)
df

Unnamed: 0,0,1
0,1,2
1,3,4
2,5,3


In [4]:
#Dictionary to DataFrame:

data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
pd.DataFrame.from_dict(data)

Unnamed: 0,col_1,col_2
0,3,a
1,2,b
2,1,c
3,0,d


In [5]:
#Text file (with spaces) to DataFrame:

df = pd.read_fwf('./primitivo.txt')
df

FileNotFoundError: [Errno 2] No such file or directory: './primitivo.txt'

In [5]:
#Excel file to DataFrame:

df= pd.read_excel('../datasets/person.xlsx')
df

Unnamed: 0,Name,Age,Gender
0,Siri,15,f
1,Laura,6,f
2,Oscar,5,m


In [6]:
#Csv file to DataFrame:

df = pd.read_csv('../datasets/GDP-2015.csv')
df

Unnamed: 0,Entity,Code,Year,GDP per capita
0,Afghanistan,AFG,2015,1928
1,Albania,ALB,2015,10947
2,Algeria,DZA,2015,13024
3,Angola,AGO,2015,8631
4,Argentina,ARG,2015,19316
...,...,...,...,...
162,Venezuela,VEN,2015,16257
163,Vietnam,VNM,2015,5733
164,Yemen,YEM,2015,2496
165,Zambia,ZMB,2015,3537


In [23]:
iris = datasets.load_iris()
type(iris)

sklearn.utils.Bunch

In [22]:
#scikit files to DataFrame:

iris_df = pd.DataFrame(iris.data, columns = iris.feature_names)
iris_df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


## Looking at DataFrames

In [8]:
df.head()#first 5 rows

Unnamed: 0,Entity,Code,Year,GDP per capita
0,Afghanistan,AFG,2015,1928
1,Albania,ALB,2015,10947
2,Algeria,DZA,2015,13024
3,Angola,AGO,2015,8631
4,Argentina,ARG,2015,19316


In [9]:
df.tail(3)  #last 3 rows

Unnamed: 0,Entity,Code,Year,GDP per capita
164,Yemen,YEM,2015,2496
165,Zambia,ZMB,2015,3537
166,Zimbabwe,ZWE,2015,1759


In [10]:
df.describe()

Unnamed: 0,Year,GDP per capita
count,167.0,167.0
mean,2015.0,18216.598802
std,0.0,19305.364946
min,2015.0,605.0
25%,2015.0,3705.0
50%,2015.0,11738.0
75%,2015.0,25843.0
max,2015.0,139542.0


## Working with DataFrames

Grab a column:

In [11]:
df.columns

Index(['Entity', 'Code', 'Year', 'GDP per capita'], dtype='object')

In [12]:
countries = df['Entity']
countries

0      Afghanistan
1          Albania
2          Algeria
3           Angola
4        Argentina
          ...     
162      Venezuela
163        Vietnam
164          Yemen
165         Zambia
166       Zimbabwe
Name: Entity, Length: 167, dtype: object

In [13]:
gdp = df['GDP per capita']
gdp

0       1928
1      10947
2      13024
3       8631
4      19316
       ...  
162    16257
163     5733
164     2496
165     3537
166     1759
Name: GDP per capita, Length: 167, dtype: int64

Grab an entry:

In [14]:
gdp = df['GDP per capita']
gdpAngola = df['GDP per capita'][3]
gdpAngola

8631

## Small example

In [15]:
personer= pd.read_excel('../datasets/person.xlsx')
personer['Name']

0     Siri
1    Laura
2    Oscar
Name: Name, dtype: object

Add a column:

In [16]:
personer['HasBike'] = True
personer.head()

Unnamed: 0,Name,Age,Gender,HasBike
0,Siri,15,f,True
1,Laura,6,f,True
2,Oscar,5,m,True


Save changes to a file:

In [17]:
personer.to_excel('../datasets/person_ny.xlsx')