# Introduction to Pandas Series and DataFrames

## Objectives

* Understand Pandas Series and DataFrames
* Creating Series and DataFrames
* Basic Operations with Series 
* Exploring DataFrame Basics
* Selecting Data from DataFrames
* Applying Functions to Series and DataFrames

## Loading Libraries

In [5]:
# numpys - for arithmetic operations and high-level mathematical functions to operate on arrays
import numpy as np
# pandas - for working with relational or labeled data
import pandas as pd 

## What is a Pandas Series?

* **One-Dimensional** labeled Array capable of holding data on any type such as *intergers*, *string*, *float*, *python objects* etc.
* A pandas series is like a column in a table.


### Key features of a Pandas Series

* **Homogeneous Data**: A Series Holds Data of a single data type(integer, float, string etc), ensuring homogeneity within the Series.
* **Labeled Index**: Each element in a Series is associated with a label called an *index*. Having unique labels is a common practice, though not strictly required. The labels just need to be hashable types, ie they need to be used as keys in a dictionary. This index allows for easy and efficient data retrieval and manipulation.
* **Vectorized Operations**: - Series support vectorized operations, ie you can apply operations to the entire series without the need for explicit loops.
* **Alignment of Data**: - When performing operations on a Series, Pandas automatically aligns data based on index labels, which simplifies data manipulation.
* **Creation**: - Can be created from a List, NumpyArrays, Dictionary, DataFrame slice and other data sources. 

In [6]:
# example of a series from a list 
marks = [10, 20, 33, 42, 19, 30]

# series
marks_series = pd.Series(marks)
marks_series

0    10
1    20
2    33
3    42
4    19
5    30
dtype: int64

## Creating and Displaying

In [7]:
# example 1 - Creating a series from a list
data = [10.5, 11.2, 10.7, 9.9, 10.2]

# series
list_series = pd.Series(data, name="Student Marks")
list_series

0    10.5
1    11.2
2    10.7
3     9.9
4    10.2
Name: Student Marks, dtype: float64

In [8]:
# data type 
type(list_series)

pandas.core.series.Series

In [9]:
# example 2 - Creating a series from a NumPy Array
data_arr = np.array(data) # created an array from a list

type(data_arr)

numpy.ndarray

In [10]:
# series from array
arr_series = pd.Series(data_arr, name="Array Series")
arr_series

0    10.5
1    11.2
2    10.7
3     9.9
4    10.2
Name: Array Series, dtype: float64

In [11]:
# example 3 - Series dictionary 
data_dict = {
    "Prof" : 100,
    "Dominic" : 250,
    "Carol" : 300, 
    "Eve" : 450
}

type(data_dict)

dict

In [12]:
# series from dict
dict_series = pd.Series(data_dict, name="Sky Team")
dict_series

Prof       100
Dominic    250
Carol      300
Eve        450
Name: Sky Team, dtype: int64

In [13]:
dict_series['Carol']

300

In [14]:
# series with custom index labels
balance = [1000, 1500, 2000, 4000] # data to store in the series
custom_labels = ['A', 'B', 'C', 'D'] # custom indexes

custom_label_series = pd.Series(data = balance, index=custom_labels, name='Balances')
custom_label_series

A    1000
B    1500
C    2000
D    4000
Name: Balances, dtype: int64

## Basic Operations With Series

In [15]:
arr_series

0    10.5
1    11.2
2    10.7
3     9.9
4    10.2
Name: Array Series, dtype: float64

In [16]:
# accessing elements in a series 
print(arr_series[3])

9.9


In [17]:
dict_series

Prof       100
Dominic    250
Carol      300
Eve        450
Name: Sky Team, dtype: int64

In [18]:
# accessing elements in a series 
print(dict_series['Carol'])

300


In [19]:
custom_label_series

A    1000
B    1500
C    2000
D    4000
Name: Balances, dtype: int64

In [20]:
# accessing elemets in a series
print(custom_label_series['B':'D'])

B    1500
C    2000
D    4000
Name: Balances, dtype: int64


In [21]:
# arithmetic operations
# convert balances into percentages 
x = custom_label_series / 100
x

A    10.0
B    15.0
C    20.0
D    40.0
Name: Balances, dtype: float64

In [22]:
# filter elements 
x_filtered = x[x >= 15]
x_filtered

B    15.0
C    20.0
D    40.0
Name: Balances, dtype: float64

In [23]:
# basic summary statistics
x

A    10.0
B    15.0
C    20.0
D    40.0
Name: Balances, dtype: float64

In [24]:
# mean 
mean = x.mean()
print(mean)

21.25


In [25]:
# std 
std = x.std()
print(std)

13.149778198382917


In [26]:
# max
max = x.max()
print(max)

40.0


## Applying Functions to a Series 

### Lambda Functions

* Small anonymous function that is not bound to an identifier.
* Similar to user defined functions but without a name.
* It's simple and straightfoward, requiring only the argument(s) and expression, alongside the keyword `lambda`.
* They require only one line of code.

```
def func_name(parameters):
    code block
    
    return return_value
```

`func = lamda parameters: return_value`

* `lambda` : Keyword that indicates definition of a lambda function.
* `parameters`: The input parameters that the lambda function will take.
* `return_value`: A single expression that defines the compuation the lambda function performs and its return value 

In [97]:
# lets compare the two
# def is a key word, that tells python we are creating a user defined function
#square is the name of the user defined function created
# x is the parameter which stores the value to be used in the function

def square(x):
  # x is the parameter 
    """Function to square numbers"""
    out = x ** 2
    
    return out
square(9)


81

In [98]:
# lamda function
# lambda is a one line user defined function
# square_lambda is a variable
# x: is the parameter

square_lambda = lambda x: x ** 2
    
square_lambda(5)


25

In [34]:
# lambda function to print Hello world # example 2
# print_hello is the function

def print_hello():
    y = 'Hello World'
    
    return y

print_hello()

'Hello World'

In [35]:
# lambda function to print Hello world # example 3

letter = lambda : 'Hello world'

letter()

'Hello world'

In [38]:
# example 4

def even_odd(number: int):
    """Check if a number is even or odd"""
    
    if (number % 2 == 0):
        return 'Even'
    else:
        return 'Odd'
    
even_odd(7)
    

'Odd'

In [99]:
7/3

2.3333333333333335

In [100]:
7%3

1

In [101]:
7//3

2

In [42]:
# example 5 of example 4 in lambda
# lambda is the key word
# key word instruct python on what to expect

even_odd_lambda = lambda number : 'Even' if (number % 2 == 0) else 'Odd'

even_odd_lambda(7)

'Odd'

In [43]:
# example 6 of example 5 in lambda

even_odd_lambda = lambda number : 'Even' if (number % 2 == 0) else 'Odd'

even_odd_lambda(8)

'Even'

### Generate Random Numbers

* Using `NumPy` library to generate random Numbers.


In [52]:
# generate random numbers # example 1

random_numbers = np.random.randint(3, 99, size=120)


In [53]:
# display random numbers of example 1
random_numbers

array([48, 33, 27, 24, 63, 28, 67, 60, 10, 90, 12, 62, 88, 95, 35, 49, 85,
       52, 65, 83, 79, 25,  4, 56, 75, 37, 68, 72, 83, 96, 40,  7, 96, 55,
        7, 98, 55,  5, 17, 13, 39, 48, 80, 27, 84, 10, 34, 39, 60, 85, 62,
       19, 44, 23,  3, 51, 96,  3, 94, 85, 26, 71, 96, 41, 32, 77, 75, 74,
       46, 96,  6, 61, 16, 67,  3, 40, 74, 13, 53, 69, 59, 41, 32, 22, 70,
       73, 81,  4, 92, 66, 58, 64, 79, 97, 42,  8, 93, 32, 57, 90, 43, 24,
       29, 47, 11, 83, 47, 17, 74,  7, 80, 97, 98, 55, 54, 51, 76, 81, 90,
        8])

In [54]:
# create a series # example 1

numbers = pd.Series(random_numbers, name="Numbers")
numbers

0      48
1      33
2      27
3      24
4      63
       ..
115    51
116    76
117    81
118    90
119     8
Name: Numbers, Length: 120, dtype: int32

In [55]:
# generate random numbers # example 2 of 1

random_numbers = np.random.randint(3, 99, size=20)


In [56]:
# display random numbers of example 2

random_numbers

array([20, 83,  8, 78, 97,  6, 73, 17, 57, 28, 26, 43, 12, 11, 84, 70, 38,
       19, 35, 64])

In [57]:
# create a series # example 2

numbers = pd.Series(random_numbers, name="Numbers")
numbers

0     20
1     83
2      8
3     78
4     97
5      6
6     73
7     17
8     57
9     28
10    26
11    43
12    12
13    11
14    84
15    70
16    38
17    19
18    35
19    64
Name: Numbers, dtype: int32

In [60]:
# display the first five rows of the series

numbers.head()

0    20
1    83
2     8
3    78
4    97
Name: Numbers, dtype: int32

In [61]:
numbers.head(7)

0    20
1    83
2     8
3    78
4    97
5     6
6    73
Name: Numbers, dtype: int32

In [62]:
# display last five rows

numbers.tail()

15    70
16    38
17    19
18    35
19    64
Name: Numbers, dtype: int32

In [63]:
numbers.tail(9)

11    43
12    12
13    11
14    84
15    70
16    38
17    19
18    35
19    64
Name: Numbers, dtype: int32

### Using the `apply()` Function in a Series

* It's a powerful way to transform and analyze the data within the series.
* Above we have generate a series of random numbers, and created a function called `square` that takes in an int, squares it and return the value. Lets apply that function to the series.

In [65]:
# reminder of function square

square(4)

16

In [66]:
# square the series random numbers 

squared_numbers = numbers.apply(square)
squared_numbers.tail()

15    4900
16    1444
17     361
18    1225
19    4096
Name: Numbers, dtype: int64

In [70]:
# use .rename to rename the series

squared_numbers.rename('Squared Numbers', inplace=True) # inplace =True - this makes the changes permanent
squared_numbers.head()

0     400
1    6889
2      64
3    6084
4    9409
Name: Squared Numbers, dtype: int64

### `lambda` function with `apply()`

In [71]:
# Cube the numbers using lambda and apply

cubed_numbers = numbers.apply(lambda j:j ** 3)
cubed_numbers.head()


0      8000
1    571787
2       512
3    474552
4    912673
Name: Numbers, dtype: int64

In [72]:
# rename the series

cubed_numbers.rename('Cubed Numbers', inplace=True)
cubed_numbers.head()

0      8000
1    571787
2       512
3    474552
4    912673
Name: Cubed Numbers, dtype: int64

### Using the `map()` Function in a series

* Used to substitute each value in a Series with another value creating a convenient way to transform the values in a Series.

In [74]:
def bmi_value(x):
    if x <= 43:
        return 'Underweight'
    else:
        return 'Overweight'
    
bmi_value(20)

'Underweight'

In [78]:
# map our random numbers as underweight or overweight

bmi_series = numbers.map(bmi_value)
bmi_series.head(10)


0    Underweight
1     Overweight
2    Underweight
3     Overweight
4     Overweight
5    Underweight
6     Overweight
7    Underweight
8     Overweight
9    Underweight
Name: Numbers, dtype: object

In [79]:
bmi_series.rename('BMI', inplace=True)
bmi_series.head()

0    Underweight
1     Overweight
2    Underweight
3     Overweight
4     Overweight
Name: BMI, dtype: object

### `lambda` function with `map()`

In [80]:
# use lamda function with map() to double each number

double_number = numbers.map(lambda y: y* 2)
double_number.head()

0     40
1    166
2     16
3    156
4    194
Name: Numbers, dtype: int64

In [81]:
# rename the series

double_number.rename('Double Number', inplace=True)
double_number.head()

0     40
1    166
2     16
3    156
4    194
Name: Double Number, dtype: int64

### `lamda` function with Conditional Statement

In [92]:
# are the random numbers even or odd

even_odd_series = numbers.apply(lambda k: 'Even' if (k % 2 == 0) else 'Odd')
even_odd_series.tail()

15    Even
16    Even
17     Odd
18     Odd
19    Even
Name: Numbers, dtype: object

In [None]:
even_odd_lambda = lambda number : 'Even' if (number % 2 == 0) else 'Odd'

even_odd_lambda(8)

In [94]:
# rename the series

even_odd_series.rename('Even/Odd', inplace=True)
even_odd_series.head()

0    Even
1     Odd
2    Even
3    Even
4     Odd
Name: Even/Odd, dtype: object

## Series to DataFrame 

* `if` a **Series** is a *table* with a single column, `elif` a **DataFrame** is a *table* with two or more columns.

In [84]:
# lets convert all the series we created into a dataframe

print(numbers.name)

print(squared_numbers.name)

Numbers
Squared Numbers


In [95]:
# creating a dataframe

test_df = pd.DataFrame({
    numbers.name: numbers,
    double_number.name: double_number,
    squared_numbers.name : squared_numbers,
    cubed_numbers.name : cubed_numbers,
    bmi_series.name : bmi_series,
    even_odd_series.name : even_odd_series
})

test_df.head()

Unnamed: 0,Numbers,Double Number,Squared Numbers,Cubed Numbers,BMI,Even/Odd
0,20,40,400,8000,Underweight,Even
1,83,166,6889,571787,Overweight,Odd
2,8,16,64,512,Underweight,Even
3,78,156,6084,474552,Overweight,Even
4,97,194,9409,912673,Overweight,Odd


## Knock Yourself Out!

You work as a real estate agent at *MoringaHome Realty*. To assist your clients in making informed decisions about property investment, you decide to analyze property data using Pandas. 
1. Generate 120 random numbers between  Ksh 4000 and Ksh 20,000 using numpy to represent the prices of the houses. 
2. Display the first and last 7 houses.
3. Create a function that will take in the price of the house and return the category of that house, eg Suburb. The category is of your own series.
4. Apply the function created above to the series.
6. Apply a lambda function to increase the property prices by 10% due to the new tax laws.
7. Apply a custom function to increase the property prices by and additional Ksh 250 for garbage. 
8. Create a new Series for each step and Finally Combine them all into a DataFrame name 'Moringa_property'.

In [10]:
# Question 1: Generate 120 random numbers between Ksh 4000 and Ksh 20,000 using numpy to represent the prices of the houses.

import numpy as np
import pandas as pd

property_prices = np.random.randint(4000, 20001, 120)
property_prices

array([19740, 18639, 15261, 11234, 19855, 12470, 18275,  5699, 10697,
        8128, 16176, 10966,  8075, 14690,  8626,  5577, 19916, 16303,
       18061, 13557,  7264, 14160, 14834,  5026,  8551, 12326, 13709,
       18677, 17626, 11896, 12626, 10192, 15980, 15781,  9922,  7095,
       10521,  8456, 13400, 18800, 19527,  4636, 10919, 17432, 17883,
        4249, 11366, 13089,  4252,  9062, 16163,  4893, 13311,  8755,
       18483,  6747, 12021,  6529, 17875, 11314, 10424,  9531, 17250,
       12912,  9762, 18336, 12166, 11853, 18891, 17648,  8320, 12887,
        7606,  8674, 18497, 19688,  8171,  5021, 12630, 16266, 17146,
       16808,  7314,  4192,  4902, 17525,  5190, 11033, 12634, 10718,
       16343,  4298,  6243, 12779,  8456, 18304, 11392,  5180, 17759,
       13800, 17356, 13138,  7438,  5866, 19828, 14076, 11906, 16031,
       11288,  7009, 11467,  7427,  9388, 14761, 19658, 12945,  6376,
       19433,  5235, 19431])

In [11]:
# Question 2: Display the first 7 houses.

property_prices_series = pd.Series(property_prices)
property_prices_series.head(7)

0    19740
1    18639
2    15261
3    11234
4    19855
5    12470
6    18275
dtype: int32

In [12]:
# Question 2: Display the last 7 houses.

property_prices_series.tail(7)

113    14761
114    19658
115    12945
116     6376
117    19433
118     5235
119    19431
dtype: int32

In [13]:
# Question 3: Create a function that will take in the price of the house and return the category of that house, eg Suburb. The category is of your own series.

def property_category(price):
    """
    Function that takes in the price 
    of the house and returns the category of that house, eg Suburb.
    """
    if price < 3000:
        return "low_cost"
    elif (3000 <= price) and (price < 10000):
        return "average_cost"
    elif (10000 <= price) and (price < 30000):
        return "mid_cost"
    elif (30000 <= price) and (price < 100000):
        return "high_cost"
    else:
        return "living_large"
    

In [33]:
# Question 4: Apply the function created above to the series.

property_per_category = property_prices_series.apply(property_category)
property_per_category.head()

0    mid_cost
1    mid_cost
2    mid_cost
3    mid_cost
4    mid_cost
dtype: object

In [26]:
# Question 5: Apply a lambda function to increase the property prices by 10% due to the new tax laws.

taxed_property_prices = property_prices_series.apply(lambda x: x +(x*0.1))
taxed_property_prices.head()

0    21714.0
1    20502.9
2    16787.1
3    12357.4
4    21840.5
dtype: float64

In [39]:
# Question 6: Apply a custom function to increase the property prices by and additional Ksh 250 for garbage.

def taxed_property_prices():
   
    garbage_fee = 250
    garbage = taxed_property_prices + garbage_fee

    return garbage

In [43]:

taxed_property_prices_garbage = taxed_property_prices()
taxed_property_prices_garbage.head()


TypeError: unsupported operand type(s) for +: 'function' and 'int'

In [52]:
# Question 7: Create a new Series for each step and Finally Combine them all into a DataFrame name 'Moringa_property'.

data = {
    'Property Prices': property_prices,
    'Property Categories': property_per_category,
    'Taxed Prices': taxed_property_prices,
    'Taxed with Garbage Prices': taxed_property_prices_garbage
}


Moringa_property = pd.DataFrame(data)

print("\nMoringa_property:")
print(Moringa_property.head(23))

NameError: name 'taxed_property_prices_garbage' is not defined

In [53]:
# A function that takes in two numbers, sums the two numbers and checks if the sum of the two numbers is odd or even.
# The function returns the sum and shows if the sum is odd or even.

# eg 5 + 10 = 15, the function will take in 5 and 10 as parameters, sum them to get 15, the output of the function will be 15 and odd.

def add_numbers():
    a = 5
    b = 10
    sum = a + b

    print(sum)
    
def add_numbers():
    
if(sum % 7) == 0:
    print("even")
    
else:
    print("odd")

IndentationError: expected an indented block after function definition on line 13 (358763373.py, line 15)