# Introduction to Pandas
Pandas is a high-level data manipulation package which was built on top of Numpy. The key structures within pandas include series and Dataframes

## Series

A Series is a one-dimensional array with axis labels (an index).

In [2]:
# Importing libraries and packages
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

In [4]:
# Creating a Series from a list
x = pd.Series([10,20,30,40,50])
x

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [5]:
# We can access different components separately:

# Accessing the index
x.index

RangeIndex(start=0, stop=5, step=1)

In [6]:
# Accessing values
x.values

array([10, 20, 30, 40, 50])

In [7]:
# Accessing the datatype
# A Series is an ndarray, thus it's homogeneous and CANNOT store multiple datatypes
x.dtype

dtype('int64')

In [8]:
# Creating a series with an Index
data = [450, 650, 870]
Sales = Series(data, index=["Don", "Mike", "Edwin"])
Sales

Don      450
Mike     650
Edwin    870
dtype: int64

In [10]:
# Check the type
type(Sales)

pandas.core.series.Series

In [11]:
# If we check the index of Sales, we will get the values, rather than the range, because it's a string.
Sales.index

Index(['Don', 'Mike', 'Edwin'], dtype='object')

### Accessing Values

In [12]:
# You can access values using the index name
Sales["Don"]

np.int64(450)

In [13]:
# You can still use traditional indexing
Sales[0]

  Sales[0]


np.int64(450)

### Checking for conditions

In [15]:
# You can filter based on conditions.
Sales>500
# This will usually return booleans.

Don      False
Mike      True
Edwin     True
dtype: bool

In [16]:
# We can use these booleans
Sales[[False, True, True]]

Mike     650
Edwin    870
dtype: int64

In [17]:
# If we want to see values greater than 500, we can use those booleans
Sales[Sales>500]

Mike     650
Edwin    870
dtype: int64

In [18]:
# Checking the names in the index
"Don" in Sales

True

In [19]:
# False example
"Sally" in Sales

False

In [22]:
# What about this?
450 in Sales
# 450 is not an index, it's a value. Thus, it will return False.

False

### Working with Dictionaries

In [24]:
# Converting a Series to a dictionary
sales_dict = Sales.to_dict()
sales_dict

{'Don': 450, 'Mike': 650, 'Edwin': 870}

In [25]:
# Converting a dict to a series
sales_ser = Series(sales_dict)
sales_ser

Don      450
Mike     650
Edwin    870
dtype: int64

### Creating series from an existing one, and NaN values

In [27]:
# We can create a new Series from an existing Series
# If we specify names in the index that were Not there already, NaN values will be assigned
new_sales = Series(Sales, index=["Don", "Mike", "Sally", "Edwin", "lucy"])
new_sales

Don      450.0
Mike     650.0
Sally      NaN
Edwin    870.0
lucy       NaN
dtype: float64

In [31]:
# We can check if there are any NaN values in a Series.
# For this we use Numpy!
np.isnan(new_sales)

Don      False
Mike     False
Sally     True
Edwin    False
lucy      True
dtype: bool

In [32]:
# To check for null values, use Pandas!
pd.isnull(new_sales)

Don      False
Mike     False
Sally     True
Edwin    False
lucy      True
dtype: bool

### Naming components in a Series

In [35]:
# Name an index
Sales.index.name = "Sales person"
Sales

Sales person
Don      450
Mike     650
Edwin    870
dtype: int64

In [37]:
# Naming a Series
Sales.name = "Total tv sales"
Sales

Sales person
Don      450
Mike     650
Edwin    870
Name: Total tv sales, dtype: int64

## DataFrames

DataFrames are two-dimensional, size-mutable (meaning that it can change), potentially heterogeneous (can have multiple data types) tabular data structures. This data structure contains TWO labeled axes (rows and the columns).

### Creating a DataFrame

In [39]:
# Creating a DataFrame from a list
data = [["adrian", 20], ["bethany", 23], ["bob", 33]]

# When we create a DataFrame, we can specify what the column names are and the data type is
df = pd.DataFrame(data, columns=["Name", "Age"])
df

Unnamed: 0,Name,Age
0,adrian,20
1,bethany,23
2,bob,33
