# Using Pandas

Author: David Oury
<br>Created: 17 Nov 2016
<br>Modified: 17 Nov 2016

Import the `pandas` package as `pd`.

In [2]:
import pandas as pd

The `statsmodels` Python package makes datasets from R and other languages available in Python. 

- http://statsmodels.sourceforge.net/0.6.0/index.html

See the section [Using Datasets from R](http://statsmodels.sourceforge.net/0.6.0/datasets/index.html#using-datasets-from-r).

Import the package and use `sm` as its shortcut.

In [3]:
import statsmodels.api as sm

Retrieve the `iris` dataframe and display the first five rows use the `head` function.

In [4]:
iris_df = sm.datasets.get_rdataset("iris").data
iris_df.head()

Unnamed: 0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


The dataframe is in Pandas format.

In [5]:
type(iris_df)

pandas.core.frame.DataFrame

The `colums` and `dtypes` provide information about the columns of the dataframe.

In [6]:
iris_df.columns

Index(['Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width',
       'Species'],
      dtype='object')

In [7]:
iris_df.dtypes

Sepal.Length    float64
Sepal.Width     float64
Petal.Length    float64
Petal.Width     float64
Species          object
dtype: object

The `info` function provides information such as the number of rows, number of columns, column names, column types and the amount of memory used to store the dataframe.

In [8]:
iris_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
Sepal.Length    150 non-null float64
Sepal.Width     150 non-null float64
Petal.Length    150 non-null float64
Petal.Width     150 non-null float64
Species         150 non-null object
dtypes: float64(4), object(1)
memory usage: 5.9+ KB


## Retrieving rows and columns

The `ix` method/function can be used to specify both the rows and columns to be retrieved from a dataframe. The result is always a dataframe. 

Below the fifth and third rows of the dataframe are displayed (in that order.) All columns are displayed as nothing is specified between the comma and the left bracket "]".

The `shape` functions returns the dimensions (number of rows, number of columns) of the dataframe.

In [10]:
row5 = iris_df.ix[[4,2],]
print(type(row5))
print(row5.shape)
row5

<class 'pandas.core.frame.DataFrame'>
(2, 5)


Unnamed: 0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
4,5.0,3.6,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa


Below the fourth column of the dataframe is returned and stored in `col4`. Notice that the result is a dataframe with 150 rows and 1 column. The `tail` function is used to display the last 5 rows of `col4`.

In [13]:
col4 = iris_df.ix[:,[3]]
print(type(col4))
print(col4.shape)
col4.tail()

<class 'pandas.core.frame.DataFrame'>
(150, 1)


Unnamed: 0,Petal.Width
145,2.3
146,1.9
147,2.0
148,2.3
149,1.8


## Retrieving a dataframe column as a `Series` object, which is similar to a vector in R. 

The `squeeze` method will return a `Series` object if the input is a single columm of a dataframe or a row with only a single type of value. 

In [14]:
type(col4.squeeze())

pandas.core.series.Series

In [23]:
iris_df.ix[[3],[3]]

Unnamed: 0,Petal.Width
3,0.2


In [27]:
iris_df.ix[[3],list(range(4))].squeeze()

Sepal.Length    4.6
Sepal.Width     3.1
Petal.Length    1.5
Petal.Width     0.2
Name: 3, dtype: float64

In [None]:
type(iris_df.Species)

In [None]:
type(iris_df.Petal.Width)

In [None]:
iris_df.columns = ['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width', 'Species']

In [None]:
type(iris_df.Petal_Width)

In [None]:
col2.squeeze().head()

In [None]:
iris_df.ix[[3,5],[1,4]].squeeze()

In [None]:
iris_df.groupby('Species').head(2)

In [None]:
iris_df.groupby('Species')[['Sepal_Width','Sepal_Length']].mean()

# Credit


Pandas for Everyone: Python Data Analysis
<br>by Daniel Y. Chen
<br>Publisher: Addison-Wesley Professional
<br>Release Date: September 2017
<br>ISBN: 9780134547046