# Getting started with Pandas

In [1]:
import pandas as pd

# DataFrame
A 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns.

To manually store data in a table, create a DataFrame. When using a Python dictionary of lists, the dictionary keys will be used as column headers and the values in each list as columns of the DataFrame.

In [None]:
#To manually store data in a table, we do that using a DataFrame as shown below:
df = pd.DataFrame({
    'Name':['Olivia','Nathan','Hannah','Noah'],
    'Age':[56,30,42,23],
    'Sex':['Female','Male','Female','Male'],
    'Role':['Housewife','Soldier','Tailor','Student'],
})
df

Unnamed: 0,Name,Age,Sex,Role
0,Olivia,56,Female,Housewife
1,Nathan,30,Male,Soldier
2,Hannah,42,Female,Tailor
3,Noah,23,Male,Student


# Each column in a dataframe is a series
Therefore the Name, Age, Sex and Role are called series

In [6]:
#When selecting a single column of a pandas DataFrame, the result is a Pandas series
df['Age']

0    56
1    30
2    42
3    23
Name: Age, dtype: int64

A pandas Series has no column labels, as it is just a single column of a DataFrame. A Series does have row labels.

In [7]:
#You can create a Series from scratch as well
ages = pd.Series([56,30,42,23])
ages

0    56
1    30
2    42
3    23
dtype: int64

# Do something with a DataFrame or Series

To determine the maximum age of card holders, apply the max() method/function

In [8]:
#With a DataFrame
df['Age'].max()

np.int64(56)

In [9]:
#With series
ages.max()

np.int64(56)

Basic Statistics of the numerical data

The describe() method provides a quick overview of the numerical data in a DataFrame. As the Name and Sex series are textual data, these are by default not taken into account by the describe() method.

In [10]:
df.describe()

Unnamed: 0,Age
count,4.0
mean,37.75
std,14.476993
min,23.0
25%,28.25
50%,36.0
75%,45.5
max,56.0


In [11]:
ages.describe()

count     4.000000
mean     37.750000
std      14.476993
min      23.000000
25%      28.250000
50%      36.000000
75%      45.500000
max      56.000000
dtype: float64

# How to Read and Write Tabular Data

pandas, provide the read_csv() function to read data stored as a csv file into a pandas DataFrame. pandas supports many different file formats or data sources out of the box(csv, excel, json, sql ...) each of them with the prefix read_

When displaying a DataFrame, the first and last 5 rows will be shown by default.

In [12]:
iris = pd.read_csv('Iris.csv')
iris

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...,...
145,146,6.7,3.0,5.2,2.3,Iris-virginica
146,147,6.3,2.5,5.0,1.9,Iris-virginica
147,148,6.5,3.0,5.2,2.0,Iris-virginica
148,149,6.2,3.4,5.4,2.3,Iris-virginica


To see the first n rows of a DataFrame, use the head() method with the required number of rows, as shown below

In [13]:
iris.head(9)

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
5,6,5.4,3.9,1.7,0.4,Iris-setosa
6,7,4.6,3.4,1.4,0.3,Iris-setosa
7,8,5.0,3.4,1.5,0.2,Iris-setosa
8,9,4.4,2.9,1.4,0.2,Iris-setosa


If interested in the last rows, use the tail() method.

In [14]:
iris.tail(9)

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
141,142,6.9,3.1,5.1,2.3,Iris-virginica
142,143,5.8,2.7,5.1,1.9,Iris-virginica
143,144,6.8,3.2,5.9,2.3,Iris-virginica
144,145,6.7,3.3,5.7,2.5,Iris-virginica
145,146,6.7,3.0,5.2,2.3,Iris-virginica
146,147,6.3,2.5,5.0,1.9,Iris-virginica
147,148,6.5,3.0,5.2,2.0,Iris-virginica
148,149,6.2,3.4,5.4,2.3,Iris-virginica
149,150,5.9,3.0,5.1,1.8,Iris-virginica


# dtypes Attributes
A check on how pandas interpret each of the series data types by requesting the pandas dytpes attribute

In [15]:
iris.dtypes

Id                 int64
SepalLengthCm    float64
SepalWidthCm     float64
PetalLengthCm    float64
PetalWidthCm     float64
Species           object
dtype: object