# Introduction to Pandas

Pandas is a high level data manipulation package which was bulit on top of numpy. The key structures within pandas include series and dataframes.

## Series 

A series is a one dimensional array with axis labels(an index).

In [2]:
!pip install pandas



In [1]:
#importing libraries and packages
import numpy as np
import pandas as pd

In [2]:
x = pd.Series([10,20,30,40,50])
x

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [4]:
#accessing the index
x.index

RangeIndex(start=0, stop=5, step=1)

In [5]:
# Accessing values
x.values

array([10, 20, 30, 40, 50], dtype=int64)

In [6]:
#accessing the dtype
# A series is an ndarray, its homogeneous and cannot store multiple dtypes
x.dtype

dtype('int64')

In [12]:
#creating a series with an index
data = [450, 650, 870]
sales = pd.Series(data, index=['Don', 'Mike', 'Edwin'])
sales

Don      450
Mike     650
Edwin    870
dtype: int64

In [13]:
# check the type 
type(sales)

pandas.core.series.Series

In [14]:
# check index of sales
sales.index

Index(['Don', 'Mike', 'Edwin'], dtype='object')

### Accessing Values

In [15]:
# you can access values using the index name
sales['Don']

450

In [16]:
sales[0]

  sales[0]


450

## Checking for conditions

In [17]:
#filter based on conditions
sales>500

Don      False
Mike      True
Edwin     True
dtype: bool

In [18]:
# we can use these booleans
sales[[False, True, True]]

Mike     650
Edwin    870
dtype: int64

In [19]:
#values greate than 500
sales[sales>500]

Mike     650
Edwin    870
dtype: int64

In [20]:
'Don' in sales

True

In [22]:
'Sally' in sales

False

## Working with dictionaries

In [23]:
#converting a series to dictionary
sales_dict = sales.to_dict()
sales_dict

{'Don': 450, 'Mike': 650, 'Edwin': 870}

In [25]:
#converting a dict to a series
sales_ser =pd.Series(sales_dict)
sales_ser

Don      450
Mike     650
Edwin    870
dtype: int64

## Adding entries and working with NaN/null values

In [26]:
# can create a new series from an existing series
#if we specify names in the index
new_sales=pd.Series(sales, index=['Don', 'Mike', 'Sally', 'Edwin', 'Lucy'])
new_sales

Don      450.0
Mike     650.0
Sally      NaN
Edwin    870.0
Lucy       NaN
dtype: float64

In [27]:
#check if any NaN values in the series
np.isnan(new_sales)

Don      False
Mike     False
Sally     True
Edwin    False
Lucy      True
dtype: bool

In [28]:
#check for null values
pd.isnull(new_sales)

Don      False
Mike     False
Sally     True
Edwin    False
Lucy      True
dtype: bool

## Naming components in a series

In [29]:
#name an index
sales.index.name = 'sales person'
sales

sales person
Don      450
Mike     650
Edwin    870
dtype: int64

In [30]:
#naming a series
sales.name = 'total tv sales'
sales

sales person
Don      450
Mike     650
Edwin    870
Name: total tv sales, dtype: int64

## DataFrames

Dataframes are two-dimensional, size mutable, potentially heterogeneous tabular data structures. This data structure contains TWO labelled axes (rows and coloumns).

### Creating a DataFrame

In [35]:
#Creating a dataframe from a list

data = [["Adrian" , 20], ["Bethany", 23], ["Chloe", 41]]

#to create a dataframe, specify what column names and the data type is 
df = pd.DataFrame(data, columns=["Name", "Age"])
df

Unnamed: 0,Name,Age
0,Adrian,20
1,Bethany,23
2,Chloe,41
