# Introduction to Pandas

Pandas is a high level fata manipulation package which was built on top of Numpy. The key structures within pandas include Series and Dataframes


## Series

A series is a one-dimensional array with axis labels. A series is a ndarray, thus it is homogeneous and cannot store multiple dtypes.

In [9]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

In [5]:
x = pd.Series([10,20,30,40,50])
x

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [6]:
x.index

RangeIndex(start=0, stop=5, step=1)

In [7]:
x.values

array([10, 20, 30, 40, 50])

In [8]:
x.dtype

dtype('int64')

In [11]:
# Creating series with an index

data = [450, 650, 870]
sales = Series(data, index=["Don", "Mike", "Edwin"])

In [12]:
sales

Don      450
Mike     650
Edwin    870
dtype: int64

In [13]:
type(sales)

pandas.core.series.Series

In [14]:
sales.index

Index(['Don', 'Mike', 'Edwin'], dtype='object')

## Accessing Values

In [17]:
sales["Don"]

np.int64(450)

In [18]:
sales[0]

  sales[0]


np.int64(450)

In [20]:
sales.iloc[0]

np.int64(450)

## Checking for conditions

In [21]:
sales > 500

Don      False
Mike      True
Edwin     True
dtype: bool

In [22]:
sales[[False, True, True]]

Mike     650
Edwin    870
dtype: int64

In [23]:
sales[sales>500]

Mike     650
Edwin    870
dtype: int64

In [25]:
"Sally" in sales

False

In [27]:
450 in sales
# 450 is not an index, it is a value. Hence this ouputs False

False

### Working with Dictionaries

In [29]:
# Converting series to dictionary

sales_dict = sales.to_dict()

In [30]:
sales_dict

{'Don': 450, 'Mike': 650, 'Edwin': 870}

In [31]:
# Converting dictionary to series
sales_ser = Series(sales_dict)
sales_ser

Don      450
Mike     650
Edwin    870
dtype: int64

### Adding enties and working with NaN/null values

In [35]:
# If indesx doesn't already have value, it is set to NaN by default
new_sales = Series(sales, index=["Don", "Mike", "Sally", "Edwin", "Lucy"])

In [36]:
new_sales

Don      450.0
Mike     650.0
Sally      NaN
Edwin    870.0
Lucy       NaN
dtype: float64

In [37]:
np.isnan(new_sales)

Don      False
Mike     False
Sally     True
Edwin    False
Lucy      True
dtype: bool

In [38]:
# To check for null values use pandas
pd.isnull(new_sales)

Don      False
Mike     False
Sally     True
Edwin    False
Lucy      True
dtype: bool

### Naming Components in a Series

In [40]:
sales.index.name = "Sales Person"

In [41]:
sales

Sales Person
Don      450
Mike     650
Edwin    870
dtype: int64

In [42]:
# NMaming a series
sales.name = "Total Sales"
sales

Sales Person
Don      450
Mike     650
Edwin    870
Name: Total Sales, dtype: int64

## DataFrames

DataFrames are two-dimensional, size-mutable, potentially heterogeneous tabular data structures. This data structure contains TWO labeled axes (row and column)

### Creating a DataFrame

In [44]:
# From list
data = [["Adrian", 20], ["Bethany", 23], ["Chloe", 41]]

df = pd.DataFrame(data, columns=["Name", "Age"])

In [45]:
df

Unnamed: 0,Name,Age
0,Adrian,20
1,Bethany,23
2,Chloe,41
