# Introduction to Pandas Series and DataFrames

## Objectives

* Understand Pandas Series and DataFrames
* Creating Series and DataFrames
* Basic Operations with Series 
* Exploring DataFrame Basics
* Selecting Data from DataFrames
* Applying Functions to Series and DataFrames

## Loading Libraries

In [1]:
# numpys - for arithmetic operations and high-level mathematical functions to operate on arrays
import numpy as np
# pandas - for working with relational or labeled data
import pandas as pd 

## What is a Pandas Series?

* **One-Dimensional** labeled Array capable of holding data on any type such as *intergers*, *string*, *float*, *python objects* etc.
* A pandas series is like a column in a table.


### Key features of a Pandas Series

* **Homogeneous Data**: A Series Holds Data of a single data type(integer, float, string etc), ensuring homogeneity within the Series.
* **Labeled Index**: Each element in a Series is associated with a label called an *index*. Having unique labels is a common practice, though not strictly required. The labels just need to be hashable types, ie they need to be used as keys in a dictionary. This index allows for easy and efficient data retrieval and manipulation.
* **Vectorized Operations**: - Series support vectorized operations, ie you can apply operations to the entire series without the need for explicit loops.
* **Alignment of Data**: - When performing operations on a Series, Pandas automatically aligns data based on index labels, which simplifies data manipulation.
* **Creation**: - Can be created from a List, NumpyArrays, Dictionary, DataFrame slice and other data sources. 

In [2]:
# example of a series from a list 
marks = [10, 20, 33, 42, 19, 30]

# series
marks_series = pd.Series(marks)
marks_series

0    10
1    20
2    33
3    42
4    19
5    30
dtype: int64

## Creating and Displaying

In [3]:
# example 1 - Creating a series from a list
data = [10.5, 11.2, 10.7, 9.9, 10.2]

# series
list_series = pd.Series(data, name="Student Marks")
list_series

0    10.5
1    11.2
2    10.7
3     9.9
4    10.2
Name: Student Marks, dtype: float64

In [4]:
# data type 
type(list_series)

pandas.core.series.Series

In [5]:
# example 2 - Creating a series from a NumPy Array
data_arr = np.array(data) # created an array from a list

type(data_arr)

numpy.ndarray

In [6]:
# series from array
arr_series = pd.Series(data_arr, name="Array Series")
arr_series

0    10.5
1    11.2
2    10.7
3     9.9
4    10.2
Name: Array Series, dtype: float64

In [7]:
# example 3 - Series dictionary 
data_dict = {
    "Prof" : 100,
    "Dominic" : 250,
    "Carol" : 300, 
    "Eve" : 450
}

type(data_dict)

dict

In [22]:
data_dict.keys()

dict_keys(['Prof', 'Dominic', 'Carol', 'Eve'])

In [8]:
# series from dict
dict_series = pd.Series(data_dict, name="Sky Team")
dict_series

Prof       100
Dominic    250
Carol      300
Eve        450
Name: Sky Team, dtype: int64

In [9]:
# series with custom index labels
balance = [1000, 1500, 2000, 4000] # data to store in the series
custom_labels = ['A', 'B', 'C', 'D'] # custom indexes

custom_label_series = pd.Series(data = balance, index=custom_labels, name='Balances')
custom_label_series

A    1000
B    1500
C    2000
D    4000
Name: Balances, dtype: int64

## Basic Operations With Series

In [10]:
arr_series

0    10.5
1    11.2
2    10.7
3     9.9
4    10.2
Name: Array Series, dtype: float64

In [11]:
# accessing elements in a series 
print(arr_series[3])

9.9


In [12]:
dict_series

Prof       100
Dominic    250
Carol      300
Eve        450
Name: Sky Team, dtype: int64

In [13]:
# accessing elements in a series 
print(dict_series['Carol'])

300


In [14]:
custom_label_series

A    1000
B    1500
C    2000
D    4000
Name: Balances, dtype: int64

In [15]:
# accessing elemets in a series
print(custom_label_series['B':'D'])

B    1500
C    2000
D    4000
Name: Balances, dtype: int64


In [16]:
# arithmetic operations
# convert balances into percentages 
x = custom_label_series / 100
x

A    10.0
B    15.0
C    20.0
D    40.0
Name: Balances, dtype: float64

In [17]:
# filter elements 
x_filtered = x[x >= 15]
x_filtered

B    15.0
C    20.0
D    40.0
Name: Balances, dtype: float64

In [18]:
# basic summary statistics
x

A    10.0
B    15.0
C    20.0
D    40.0
Name: Balances, dtype: float64

In [19]:
# mean 
mean = x.mean()
print(mean)

21.25


In [20]:
# std 
std = x.std()
print(std)

13.149778198382917


In [21]:
# max
max = x.max()
print(max)

40.0


## Applying Functions to a Series 

### Lambda Functions

* Small anonymous function that is not bound to an identifier.
* Similar to user defined functions but without a name.
* It's simple and straightfoward, requiring only the argument(s) and expression, alongside the keyword `lambda`.
* They require only one line of code.

```
def func_name(parameters):
    code block
    
    return return_value
```

`func = lamda parameters: return_value`

* `lambda` : Keyword that indicates definition of a lambda function.
* `parameters`: The input parameters that the lambda function will take.
* `return_value`: A single expression that defines the compuation the lambda function performs and its return value 

In [24]:
# lets compare the two
# a function to square 2 numbers
def square_numbers(a):
    """Function to square numbers"""
    value = a**2

    return value

square_numbers(10)

100

In [25]:
# lamda function
square_lambda = lambda x: x**2
square_lambda(5)

25

In [31]:
# original
def print_hello():
    y = 'Hello World!'
    
    return y
print_hello()

'Hello World!'

In [30]:
# labda function
y_lmbda = lambda : 'Hello World!'
y_lmbda()

'Hello World!'

In [33]:
def even_odd(number:int):
    if (number % 2 == 0):
        return 'Even'
    else:
        return 'Odd'
even_odd(7)

'Odd'

In [34]:
even_odd_number = lambda number: 'Even' if number % 2 == 0 else 'Odd'
even_odd_number(7)

'Odd'

In [35]:
even_odd_number1 = lambda number: f"{number} is an Even number" if number % 2 == 0 else f"{number} is an Odd number"
even_odd_number1(9)

'9 is an Odd number'

### Generate Random Numbers

* Using `NumPy` library to generate random Numbers.


In [36]:
# generate random numbers ~using random class(same features)
my_numbers = np.random.randint(3, 99, size = 120)

In [37]:
# display random numbers 
my_numbers

array([60, 97, 43, 90, 71, 77, 20, 95, 88, 80, 12,  7, 49, 71, 73, 28,  3,
       44, 88, 49, 31, 57, 71, 28, 39, 58, 43, 96, 94, 77, 67, 12, 19, 20,
       44, 77, 80, 57, 35, 57, 53, 37, 97, 47, 52, 68, 48, 68, 21, 48, 55,
       62, 54, 11, 55, 15, 48, 41, 69, 37, 68, 12, 12, 24, 66, 76, 31, 55,
       20, 46, 63, 37, 49, 78, 81, 17, 48, 32, 67, 59, 87, 28, 44, 90, 21,
       75, 14, 61, 11, 38, 46, 26, 24, 21, 88, 37, 76, 90, 85, 87, 59, 48,
       58, 32, 39,  7, 16, 80, 67, 41, 68, 39,  3, 89, 62, 40, 27, 34, 91,
       90])

In [38]:
# create a series 
my_numbers_ser = pd.Series(my_numbers, name="Numbers")
my_numbers_ser 

0      60
1      97
2      43
3      90
4      71
       ..
115    40
116    27
117    34
118    91
119    90
Name: Numbers, Length: 120, dtype: int32

In [39]:
# display the first five rows of the series
my_numbers_ser.head()

0    60
1    97
2    43
3    90
4    71
Name: Numbers, dtype: int32

In [40]:
# display last five rows
my_numbers_ser.tail()

115    40
116    27
117    34
118    91
119    90
Name: Numbers, dtype: int32

In [41]:
# display last 7 rows
my_numbers_ser.tail(7)

113    89
114    62
115    40
116    27
117    34
118    91
119    90
Name: Numbers, dtype: int32

### Using the `apply()` Function in a Series

* It's a powerful way to transform and analyze the data within the series.
* Above we have generate a series of random numbers, and created a function called `square` that takes in an int, squares it and return the value. Lets apply that function to the series.

In [43]:
# square the series random numbers 
squared_numbers = my_numbers_ser.apply(square_numbers)
squared_numbers.tail()

115    1600
116     729
117    1156
118    8281
119    8100
Name: Numbers, dtype: int64

In [45]:
# use .rename to rename the series
# inplace = True makes the changes permanent
squared_numbers.rename('Squared Numbers', inplace=True)
squared_numbers.head()

0    3600
1    9409
2    1849
3    8100
4    5041
Name: Squared Numbers, dtype: int64

### `lambda` function with `apply()`

In [47]:
# Cube the numbers using lambda and apply
cubed_numbers = my_numbers_ser.apply(lambda j: j**3)
cubed_numbers.head()

0    216000
1    912673
2     79507
3    729000
4    357911
Name: Numbers, dtype: int64

In [48]:
# rename the series
cubed_numbers.rename("Cubed Numbers", inplace=True)
cubed_numbers.head()

0    216000
1    912673
2     79507
3    729000
4    357911
Name: Cubed Numbers, dtype: int64

### Using the `map()` Function in a series

* Used to substitute each value in a Series with another value creating a convenient way to transform the values in a Series.

In [49]:
def bmi_value(x):
    if (x <= 43):
        return 'Underweight'
    else:
        return 'Overweight'
bmi_value(20)

'Underweight'

In [51]:
# map our random numbers as underweight or overweight
bmi_series = my_numbers_ser.map(bmi_value)
bmi_series.head(10)

0     Overweight
1     Overweight
2    Underweight
3     Overweight
4     Overweight
5     Overweight
6    Underweight
7     Overweight
8     Overweight
9     Overweight
Name: Numbers, dtype: object

In [53]:
bmi_series.rename("BMI", inplace=True)
bmi_series.head()

0     Overweight
1     Overweight
2    Underweight
3     Overweight
4     Overweight
Name: BMI, dtype: object

### `lambda` function with `map()`

In [54]:
# use lamda function with map() to double each number
double_number = my_numbers_ser.map(lambda y: y*2)
double_number.head()

0    120
1    194
2     86
3    180
4    142
Name: Numbers, dtype: int64

In [55]:
# rename the series
double_number.rename('Double Number', inplace=True)
double_number.head()

0    120
1    194
2     86
3    180
4    142
Name: Double Number, dtype: int64

### `lamda` function with Conditional Statement

In [57]:
# are the random numbers even or odd
even_odd_series = my_numbers_ser.apply(lambda k: 'Even' if (k % 2 == 0) else 'Odd')
even_odd_series.head()                                 

0    Even
1     Odd
2     Odd
3    Even
4     Odd
Name: Numbers, dtype: object

In [58]:
# rename the series
even_odd_series.rename("Even/Odd", inplace=True)  
even_odd_series.head()  

0    Even
1     Odd
2     Odd
3    Even
4     Odd
Name: Even/Odd, dtype: object

## Series to DataFrame 

* `if` a **Series** is a *table* with a single column, `elif` a **DataFrame** is a *table* with two or more columns.

In [59]:
# lets convert all the series we created into a dataframe
print(my_numbers_ser.name)
print(squared_numbers.name)

Numbers
Squared Numbers


In [60]:
# Creating a DataFrame
test_df = pd.DataFrame({
    my_numbers_ser.name: my_numbers_ser,
    double_number.name: double_number,
   squared_numbers.name: squared_numbers,
    bmi_series.name: bmi_series,
    even_odd_series.name: even_odd_series  
})
test_df.head(9)

Unnamed: 0,Numbers,Double Number,Squared Numbers,BMI,Even/Odd
0,60,120,3600,Overweight,Even
1,97,194,9409,Overweight,Odd
2,43,86,1849,Underweight,Odd
3,90,180,8100,Overweight,Even
4,71,142,5041,Overweight,Odd
5,77,154,5929,Overweight,Odd
6,20,40,400,Underweight,Even
7,95,190,9025,Overweight,Odd
8,88,176,7744,Overweight,Even


## Knock Yourself Out!

You work as a real estate agent at *MoringaHome Realty*. To assist your clients in making informed decisions about property investment, you decide to analyze property data using Pandas. 
1. Generate 120 random numbers between  Ksh 4000 and Ksh 20,000 using numpy to represent the prices of the houses. 
2. Display the first and last 7 houses.
3. Create a function that will take in the price of the house and return the category of that house, eg Suburb. The category is of your own series.
4. Apply the function created above to the series.
6. Apply a lambda function to increase the property prices by 10% due to the new tax laws.
7. Apply a custom function to increase the property prices by and additional Ksh 250 for garbage. 
8. Create a new Series for each step and Finally Combine them all into a DataFrame name 'Moringa_property'.