# Introduction to Pandas

---

Pandas is a high-level data manipulation package which was built on top of Numpy. The key structures within pandas include Series and DataFrames.

## Series

A Series is a one-dimensional array with axis labels (an index).

In [3]:
# Importing libraries and packages
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

In [4]:
x = pd.Series([10,20,30,40,50])
x

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [5]:
# We can access different components separately:

# Accessing the index
x.index

RangeIndex(start=0, stop=5, step=1)

In [6]:
# And the values:
x.values

array([10, 20, 30, 40, 50])

A series is an ndarray, thus it is homogenous and CANNOT store multiple data types

In [8]:
# Creating a Series with an Index
data = [450, 650, 870]
Sales = Series(data, index=['Don', 'Mike', 'Edwin'])
Sales

Don      450
Mike     650
Edwin    870
dtype: int64

In [9]:
# Check the type
type(Sales)

pandas.core.series.Series

In [10]:
# If we check the index of Sales, we will get the values, rather than the range, as we have changed them to strings
Sales.index

Index(['Don', 'Mike', 'Edwin'], dtype='object')

### Accessing Values

In [11]:
# You can access values using the index name
Sales["Don"]

np.int64(450)

### Checking for Conditions

In [14]:
# You can filter based on conditions
Sales>500
# This will usually return booleans

Don      False
Mike      True
Edwin     True
dtype: bool

In [15]:
# We can use these booleans
Sales[[False, True, True]]

Mike     650
Edwin    870
dtype: int64

In [18]:
Sales[Sales>500]

Mike     650
Edwin    870
dtype: int64

In [19]:
# Checking the name in the index
"Don" in Sales

True

In [20]:
# False example
"Howard" in Sales

False

In [21]:
450 in Sales
# 450 is not an index, it's a value. Thus it will return False

False

### Working with Dictionaries

In [23]:
# Converting a Series to a dictionary
sales_dict = Sales.to_dict()
sales_dict

{'Don': 450, 'Mike': 650, 'Edwin': 870}

In [27]:
# Converting a dict to a Series
sales_ser = Series(sales_dict)
sales_ser

Don      450
Mike     650
Edwin    870
dtype: int64

### Adding entries and working with NaN/null values

In [28]:
# We can create a new Series from an existing Series
# If we specify names in the index that were NOT there already, NaN values will be assigned
new_sales = Series(Sales, index=["Don", "Mike", "Sally", "Edwin", "Lucy"])
new_sales

Don      450.0
Mike     650.0
Sally      NaN
Edwin    870.0
Lucy       NaN
dtype: float64

In [29]:
np.isnan(new_sales)

Don      False
Mike     False
Sally     True
Edwin    False
Lucy      True
dtype: bool

In [30]:
# To check for null values, use Pandas!
pd.isnull(new_sales)

Don      False
Mike     False
Sally     True
Edwin    False
Lucy      True
dtype: bool

In [36]:
Sales.index.name = 'Sales person'
Sales

Sales person
Don      450
Mike     650
Edwin    870
Name: Total tv sales, dtype: int64

In [34]:
Sales.name = 'Total tv sales'
Sales

Don      450
Mike     650
Edwin    870
Name: Total tv sales, dtype: int64

## DataFrames
DataFrames are two-dimensional, size-mutable, potentially heterogeneous tabular data structures. This data structure contains TWO labeled axes (rows and columns).

### Creating a DataFrame

In [39]:
# Creating a DataFrame from a list
data = [["Adrian", 20], ["Bethany", 23], ["Chloe", 41]]

# When we create a DataFrame, we can specify what the column names are and the data type is
df = pd.DataFrame(data, columns=["Name", "Age"])
df

Unnamed: 0,Name,Age
0,Adrian,20
1,Bethany,23
2,Chloe,41
