# Pandas

## Import Libraries

In [2]:
import pandas as pd

## Series

**Pandas Series** is a *one-dimensional* labeled array that can hold data of any type (integer, string, float, python objects, etc.), similar to a column in an excel spreadsheet. The axis labels are collectively called **index**. 

In [3]:
print(pd.Series([1, 2, 3], index=['a', 'b', 'c'])) # with index

a    1
b    2
c    3
dtype: int64


In [4]:
print(pd.Series({'a': 1, 'b': 2, 'c':3})) # from a dict

a    1
b    2
c    3
dtype: int64


If we don’t specify the index, by default, the index would be the integer positions starting from 0. 

In a Series, we can access the value by its index directly:

In [5]:
series = pd.Series({'a': 1, 'b': 2, 'c':3})
print(series['a'])

1


Think of Series as numpy 1darray with index or row names.

## DataFrames

DataFrames are 2darrays with both row and column labels.

In [6]:
wine_dict = {
    'red_wine': [3, 6, 5],
    'white_wine':[5, 0, 10]
}
sales = pd.DataFrame(wine_dict, index=["adam", "bob", "charles"])
print(sales)

         red_wine  white_wine
adam            3           5
bob             6           0
charles         5          10


Think of `DataFrame` as a collection of the Series. Here, sales consists of two `Series`, one named under "red_wine", the other "white_wine", thus, we can access each series by calling its name:

In [7]:
print(sales['white_wine'])

adam        5
bob         0
charles    10
Name: white_wine, dtype: int64


If we don’t supply index, the DataFrame will generate an integer index starting from 0.

### Inspect a DataFrame - Shape and Size

In [8]:
presidents_df = pd.read_csv(
    'https://sololearn.com/uploads/files/president_heights_party.csv',
    index_col='name'
)

In [9]:
print(presidents_df.shape)

(45, 4)


There are 45 rows and 4 columns in this DataFrame. To get the number of rows we can access the first element in the tuple.

In [10]:
print("Number of rows:", presidents_df.shape[0])

Number of rows: 45


`size` also works on DataFrame to return an integer representing the number of elements in this object.

In [11]:
print(presidents_df.size)

180


Here both methods. .shape and .size, work in the same way as with numpy ndarrays.

### Inspect a DataFrame - Head and Tail

To see **the first few lines** in a DataFrame, use `.head()`; if we don’t specify `n` (the number of lines), by default, it displays the first five rows. Here we want to see the top 3 rows.

In [13]:
print(presidents_df.head(n=3))

                   order  age  height                  party
name                                                        
George Washington      1   57     189                   none
John Adams             2   61     170             federalist
Thomas Jefferson       3   57     189  democratic-republican


In presidents_df, the index is the name of the president, there are four columns: **order**, **age**, **height**, and **party**. Similarly, if we want to see **the last few rows**, we can use `.tail()`, the default is also five rows.

In [14]:
print(presidents_df.tail())

                   order  age  height       party
name                                             
George H. W. Bush     41   64     188  republican
Bill Clinton          42   46     188  democratic
George W. Bush        43   54     182  republican
Barack Obama          44   47     185  democratic
Donald J. Trump       45   70     191  republican


### Inspect a DataFrame - Info

Use `.info()` to get an overview of the DataFrame. Its output includes **index**, **column names**, **count of non-null values**, **dtypes**, and **memory usage**.

In [15]:
presidents_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 45 entries, George Washington to Donald J. Trump
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   order   45 non-null     int64 
 1   age     45 non-null     int64 
 2   height  45 non-null     int64 
 3   party   45 non-null     object
dtypes: int64(3), object(1)
memory usage: 1.8+ KB


The dtype for order, age, and height is integers, while party is an object. The count of non-null values in each column is the same as the number of rows, indicating no missing values. 