# Introduction to Pandas Series and DataFrames

## Objectives

* Understand Pandas Series and DataFrames
* Creating Series and DataFrames
* Basic Operations with Series 
* Exploring DataFrame Basics
* Selecting Data from DataFrames
* Applying Functions to Series and DataFrames

## Loading Libraries

In [3]:
# numpys - for arithmetic operations and high-level mathematical functions to operate on arrays
import numpy as np
# pandas - for working with relational or labeled data
import pandas as pd 

## What is a Pandas Series?

* **One-Dimensional** labeled Array capable of holding data on any type such as *intergers*, *string*, *float*, *python objects* etc.
* A pandas series is like a column in a table.


### Key features of a Pandas Series

* **Homogeneous Data**: A Series Holds Data of a single data type(integer, float, string etc), ensuring homogeneity within the Series.
* **Labeled Index**: Each element in a Series is associated with a label called an *index*. Having unique labels is a common practice, though not strictly required. The labels just need to be hashable types, ie they need to be used as keys in a dictionary. This index allows for easy and efficient data retrieval and manipulation.
* **Vectorized Operations**: - Series support vectorized operations, ie you can apply operations to the entire series without the need for explicit loops.
* **Alignment of Data**: - When performing operations on a Series, Pandas automatically aligns data based on index labels, which simplifies data manipulation.
* **Creation**: - Can be created from a List, NumpyArrays, Dictionary, DataFrame slice and other data sources. 

In [4]:
# example of a series from a list 
marks = [10, 20, 33, 42, 19, 30]

# series
marks_series = pd.Series(marks)
marks_series

0    10
1    20
2    33
3    42
4    19
5    30
dtype: int64

## Creating and Displaying

In [5]:
# example 1 - Creating a series from a list
data = [10.5, 11.2, 10.7, 9.9, 10.2]

# series
list_series = pd.Series(data, name="Student Marks")
list_series

0    10.5
1    11.2
2    10.7
3     9.9
4    10.2
Name: Student Marks, dtype: float64

In [6]:
# data type 
type(list_series)

pandas.core.series.Series

In [8]:
# example 2 - Creating a series from a NumPy Array
data_arr = np.array(data) # created an array from a list

type(data_arr)

numpy.ndarray

In [9]:
# series from array
arr_series = pd.Series(data_arr, name="Array Series")
arr_series

0    10.5
1    11.2
2    10.7
3     9.9
4    10.2
Name: Array Series, dtype: float64

In [10]:
# example 3 - Series dictionary 
data_dict = {
    "Prof" : 100,
    "Dominic" : 250,
    "Carol" : 300, 
    "Eve" : 450
}

type(data_dict)

dict

In [11]:
# series from dict
dict_series = pd.Series(data_dict, name="Sky Team")
dict_series

Prof       100
Dominic    250
Carol      300
Eve        450
Name: Sky Team, dtype: int64

In [17]:
# series with custom index labels
balance = [1000, 1500, 2000, 4000] # data to store in the series
custom_labels = ['A', 'B', 'C', 'D'] # custom indexes

custom_label_series = pd.Series(data = balance, index=custom_labels, name='Balances')
custom_label_series

A    1000
B    1500
C    2000
D    4000
Name: Balances, dtype: int64

## Basic Operations With Series

In [18]:
arr_series

0    10.5
1    11.2
2    10.7
3     9.9
4    10.2
Name: Array Series, dtype: float64

In [19]:
# accessing elements in a series 
print(arr_series[3])

9.9


In [20]:
dict_series

Prof       100
Dominic    250
Carol      300
Eve        450
Name: Sky Team, dtype: int64

In [21]:
# accessing elements in a series 
print(dict_series['Carol'])

300


In [22]:
custom_label_series

A    1000
B    1500
C    2000
D    4000
Name: Balances, dtype: int64

In [24]:
# accessing elemets in a series
print(custom_label_series['B':'D'])

B    1500
C    2000
D    4000
Name: Balances, dtype: int64


In [25]:
# arithmetic operations
# convert balances into percentages 
x = custom_label_series / 100
x

A    10.0
B    15.0
C    20.0
D    40.0
Name: Balances, dtype: float64

In [26]:
# filter elements 
x_filtered = x[x >= 15]
x_filtered

B    15.0
C    20.0
D    40.0
Name: Balances, dtype: float64

In [28]:
# basic summary statistics
x

A    10.0
B    15.0
C    20.0
D    40.0
Name: Balances, dtype: float64

In [29]:
# mean 
mean = x.mean()
print(mean)

21.25


In [30]:
# std 
std = x.std()
print(std)

13.149778198382917


In [31]:
# max
max = x.max()
print(max)

40.0


## Applying Functions to a Series 

### Lambda Functions

* Small anonymous function that is not bound to an identifier.
* Similar to user defined functions but without a name.
* It's simple and straightfoward, requiring only the argument(s) and expression, alongside the keyword `lambda`.
* They require only one line of code.

```
def func_name(parameters):
    code block
    
    return return_value
```

`func = lamda parameters: return_value`

* `lambda` : Keyword that indicates definition of a lambda function.
* `parameters`: The input parameters that the lambda function will take.
* `return_value`: A single expression that defines the compuation the lambda function performs and its return value 

In [5]:
# lets compare the two
def square(x):
#function to square numbers
    out =x**2
    return out
square(9)



81

In [6]:
# lamda function
square_lambda = lambda x:x**2
square_lambda(5)

25

In [None]:
#lambda function to print hallo world
letter =lambda : 'Hallo World'

In [10]:
def even_odd(number:int):
    if (number%2==0):
        return 'Even'
    else:
        return 'odd'

In [13]:
even_lambda =lambda number: 'Even' if (number%2==0) else odd
even_lambda(8)

'Even'

### Generate Random Numbers

In [4]:
random_numbers =np.random.randint(3,99, size=120)

* Using `NumPy` library to generate random Numbers.


In [92]:
# generate random numbers 


In [20]:
# display random numbers 
random_numbers


array([51,  6, 84, 67,  7, 51, 97, 49, 62, 31,  8, 40, 12, 29, 36, 33, 60,
        6, 18, 65, 91,  5, 74, 83, 83, 96, 47, 23, 26, 46,  7,  9, 12, 32,
       83, 44, 31, 10, 58, 89, 61, 67, 46, 78, 43, 26, 51, 57, 51, 39, 94,
       89, 23, 31, 34, 51, 56, 28, 78, 70, 49, 47, 22, 19, 57, 14, 59, 30,
       31, 48, 96, 35, 35, 63, 15, 75, 94, 26, 41, 35, 41, 37, 75, 98, 84,
       43, 68,  8, 94, 10, 76, 11,  9, 27, 96, 26, 90, 19, 51])

In [24]:
# create a series 
numbers=pd.Series(random_numbers, name="Numbers")
numbers


0      84
1      83
2      47
3       7
4      41
       ..
115    60
116    13
117    32
118    42
119    35
Name: Numbers, Length: 120, dtype: int32

In [25]:
# display the first five rows of the series
numbers.head()


0    84
1    83
2    47
3     7
4    41
Name: Numbers, dtype: int32

In [26]:
numbers.head(7)

0    84
1    83
2    47
3     7
4    41
5    66
6    46
Name: Numbers, dtype: int32

In [27]:
# display last five rows
numbers.tail()


115    60
116    13
117    32
118    42
119    35
Name: Numbers, dtype: int32

### Using the `apply()` Function in a Series

* It's a powerful way to transform and analyze the data within the series.
* Above we have generate a series of random numbers, and created a function called `square` that takes in an int, squares it and return the value. Lets apply that function to the series.

In [28]:
square(4)

16

In [32]:
# square the series random numbers 
squared_numbers=numbers.apply(square)
squared_numbers.tail()


115    3600
116     169
117    1024
118    1764
119    1225
Name: Numbers, dtype: int64

In [34]:
# use .rename to rename the series
squared_numbers.rename('Squared Numbers', inplace=True) #implace =True makes the changes permanent

0      7056
1      6889
2      2209
3        49
4      1681
       ... 
115    3600
116     169
117    1024
118    1764
119    1225
Name: Squared Numbers, Length: 120, dtype: int64

### `lambda` function with `apply()`

In [35]:
# Cube the numbers using lambda and apply
cubed_numbers=numbers.apply(lambda j:j**3)
cubed_numbers.head()


0    592704
1    571787
2    103823
3       343
4     68921
Name: Numbers, dtype: int64

In [39]:
# rename the series
cubed_numbers.rename('Cubed Numbers', inplace=True)
cubed_numbers.head()

0    592704
1    571787
2    103823
3       343
4     68921
Name: Cubed Numbers, dtype: int64

### Using the `map()` Function in a series

* Used to substitute each value in a Series with another value creating a convenient way to transform the values in a Series.

In [54]:
def bmi_value(x):
    if x<= 43:
        return'Underweight'
    else:
        return'Overweight'
bmi_value(20)

'Underweight'

In [47]:
# map our random numbers as pass or fail


### `lambda` function with `map()`

In [55]:
# use lamda function with map() to double each number
bmi_series =numbers.map(bmi_value)


In [79]:
# rename the series

### `lamda` function with Conditional Statement

In [85]:
# are the random numbers even or odd


In [None]:
# rename the series

## Series to DataFrame 

* `if` a **Series** is a *table* with a single column, `elif` a **DataFrame** is a *table* with two or more columns.

In [None]:
# lets convert all the series we created into a dataframe




## Knock Yourself Out!

You work as a real estate agent at *MoringaHome Realty*. To assist your clients in making informed decisions about property investment, you decide to analyze property data using Pandas. 
1. Generate 120 random numbers between  Ksh 4000 and Ksh 20,000 using numpy to represent the prices of the houses. 
2. Display the first and last 7 houses.
3. Create a function that will take in the price of the house and return the category of that house, eg Suburb. The category is of your own series.
4. Apply the function created above to the series.
6. Apply a lambda function to increase the property prices by 10% due to the new tax laws.
7. Apply a custom function to increase the property prices by and additional Ksh 250 for garbage. 
8. Create a new Series for each step and Finally Combine them all into a DataFrame name 'Moringa_property'.

In [7]:
import numpy as np 

In [9]:
random_nyumbas =np.random.randint(4000,20000, size=120)