# Introduction to Pandas Series and DataFrames

## Objectives

* Understand Pandas Series and DataFrames
* Creating Series and DataFrames
* Basic Operations with Series 
* Exploring DataFrame Basics
* Selecting Data from DataFrames
* Applying Functions to Series and DataFrames

## Loading Libraries

In [2]:
# numpys - for arithmetic operations and high-level mathematical functions to operate on arrays
import numpy as np
# pandas - for working with relational or labeled data
import pandas as pd
data = [10, 20, 30, 40, 50]
labels = ['A', 'B', 'C', 'D', 'E']

# Create a Series with data and custom labels
series = pd.Series(data, index=labels)


## What is a Pandas Series?

* **One-Dimensional** labeled Array capable of holding data on any type such as *intergers*, *string*, *float*, *python objects* etc.
* A pandas series is like a column in a table.


### Key features of a Pandas Series

* **Homogeneous Data**: A Series Holds Data of a single data type(integer, float, string etc), ensuring homogeneity within the Series.
* **Labeled Index**: Each element in a Series is associated with a label called an *index*. Having unique labels is a common practice, though not strictly required. The labels just need to be hashable types, ie they need to be used as keys in a dictionary. This index allows for easy and efficient data retrieval and manipulation.
* **Vectorized Operations**: - Series support vectorized operations, ie you can apply operations to the entire series without the need for explicit loops.
* **Alignment of Data**: - When performing operations on a Series, Pandas automatically aligns data based on index labels, which simplifies data manipulation.
* **Creation**: - Can be created from a List, NumpyArrays, Dictionary, DataFrame slice and other data sources. 

In [3]:
# example of a series from a list 
marks = [23, 19, 45, 67, 98, 100, 32, 76 ]


#series
marks_series = pd.Series(marks)
marks_series

0     23
1     19
2     45
3     67
4     98
5    100
6     32
7     76
dtype: int64

## Creating and Displaying

In [4]:
#example 1 Creating a series from a list
data = [3.09, 3.7, 2.4, 1.234, 8.9, 3.4, 5]

list_series = pd.Series(data, name="Composite Scores")
list_series

0    3.090
1    3.700
2    2.400
3    1.234
4    8.900
5    3.400
6    5.000
Name: Composite Scores, dtype: float64

In [5]:
type(list_series)

pandas.core.series.Series

In [6]:
# Creaming a series from numPy Array
data_arr = np.array(data) #crated an array from list

type(data_arr)

numpy.ndarray

In [7]:
#series from array
arr_series = pd.Series(data_arr, name="Array")

arr_series



0    3.090
1    3.700
2    2.400
3    1.234
4    8.900
5    3.400
6    5.000
Name: Array, dtype: float64

In [8]:
# series from dictonary
data_dict={
    "Nakuru" : 3.09,
    "Kisumu" : 3.7,
    "Nyeri" : 2.4,
    "Mombasa" : 1.234,
    "Eldoret" : 8.9,
    "Meru" : 3.4,
    "Nairobi" : 5

}

type(data_dict)

dict

In [9]:
dict_series= pd.Series(data_dict, name= "COA Sores")
dict_series

Nakuru     3.090
Kisumu     3.700
Nyeri      2.400
Mombasa    1.234
Eldoret    8.900
Meru       3.400
Nairobi    5.000
Name: COA Sores, dtype: float64

In [10]:

Percent_score = [100, 65, 78, 34, 90, 86, 20]
custom_labels = [1, 2, 3, 4, 5, 6, 7]

# Create a pandas Series with custom labels and a specified name
custom_labels = pd.Series(data=Percent_score, index=custom_labels, name="COA Marks")

# Display the resulting Series
custom_labels


1    100
2     65
3     78
4     34
5     90
6     86
7     20
Name: COA Marks, dtype: int64

In [11]:
Percent_score = [100, 65, 78, 34, 90, 86, 20]
custom_labels = (data_dict)

# Create a pandas Series with custom labels and a specified name
custom_labels = pd.Series(data=Percent_score, index=custom_labels, name="COA Marks")

# Display the resulting Series
custom_labels


Nakuru     100
Kisumu      65
Nyeri       78
Mombasa     34
Eldoret     90
Meru        86
Nairobi     20
Name: COA Marks, dtype: int64

## Basic Operations With Series

In [12]:
arr_series

0    3.090
1    3.700
2    2.400
3    1.234
4    8.900
5    3.400
6    5.000
Name: Array, dtype: float64

In [13]:
print(arr_series[4])

8.9


In [14]:
# Example 1  Access elements in a s series

elements = custom_labels[4]
print(elements)

90


In [15]:
dict_series

Nakuru     3.090
Kisumu     3.700
Nyeri      2.400
Mombasa    1.234
Eldoret    8.900
Meru       3.400
Nairobi    5.000
Name: COA Sores, dtype: float64

In [16]:
print(dict_series["Kisumu"])

3.7


In [17]:
custom_labels

Nakuru     100
Kisumu      65
Nyeri       78
Mombasa     34
Eldoret     90
Meru        86
Nairobi     20
Name: COA Marks, dtype: int64

In [18]:
print(custom_labels['Nakuru':'Eldoret'])

Nakuru     100
Kisumu      65
Nyeri       78
Mombasa     34
Eldoret     90
Name: COA Marks, dtype: int64


In [19]:
#arithematic operations
# convert in percentage
x = custom_labels / 100
x


Nakuru     1.00
Kisumu     0.65
Nyeri      0.78
Mombasa    0.34
Eldoret    0.90
Meru       0.86
Nairobi    0.20
Name: COA Marks, dtype: float64

In [20]:
x = x * 100
x

Nakuru     100.0
Kisumu      65.0
Nyeri       78.0
Mombasa     34.0
Eldoret     90.0
Meru        86.0
Nairobi     20.0
Name: COA Marks, dtype: float64

In [21]:
#filter the elements
x_filtered = x[x >= 78]
x_filtered


Nakuru     100.0
Nyeri       78.0
Eldoret     90.0
Meru        86.0
Name: COA Marks, dtype: float64

In [22]:
# summary statistics
custom_labels

Nakuru     100
Kisumu      65
Nyeri       78
Mombasa     34
Eldoret     90
Meru        86
Nairobi     20
Name: COA Marks, dtype: int64

In [23]:
mean = x.mean()
print(mean)

67.57142857142857


In [24]:
std = x.std()
print(std)

29.999206338708046


In [25]:
max = x.max()
print(max)

100.0


## Applying Functions to a Series 

### Lambda Functions

* Small anonymous function that is not bound to an identifier.
* Similar to user defined functions but without a name.
* It's simple and straightfoward, requiring only the argument(s) and expression, alongside the keyword `lambda`.
* They require only one line of code.

```
def func_name(parameters):
    code block
    
    return return_value
```

`func = lamda parameters: return_value`

* `lambda` : Keyword that indicates definition of a lambda function.
* `parameters`: The input parameters that the lambda function will take.
* `return_value`: A single expression that defines the compuation the lambda function performs and its return value 

In [26]:
# Compare the two
def square(x):

    out = x ** 2

    return out
square(7)

49

In [27]:
# labda function
square_lambda = lambda x: x ** 2

square_lambda(8)

64

In [28]:
# lambda function to print Hello world
def print_hello():
    y = 'hello Universe'

    return y

print_hello()

'hello Universe'

In [29]:
letter = lambda: "hello Universe"

letter()



'hello Universe'

In [30]:
Opening_statement = lambda: "Join the clan"

Opening_statement()

'Join the clan'

In [31]:
def even_odd(number: int):
    """ check if a number is even or odd """
    if   (number% 2 == 0):
        return 'Even'
    else:
        return 'Odd'
    
even_odd(75)

'Odd'

In [32]:
even_lambda = lambda number: "Even" if (number %2 ==0) else "Odd" 

even_lambda(76)

'Even'

In [33]:
even_lambda = lambda number: "Even" if (number %2 ==0) else "Odd" 

even_lambda(79)

'Odd'

### Generate Random Numbers

Using NumPy library to generate random Numbers.

In [34]:
import numpy as np

In [35]:
random_numbers = np.random.randint(1, 100, size=35)

In [36]:
# how to display the generated numbers
random_numbers

array([41, 95, 19, 97, 52, 16, 85, 76, 56, 59, 18, 71, 93,  8, 83, 13, 51,
       76, 81, 19, 32, 82, 36, 54, 75, 50, 31, 11, 56, 81, 12, 10, 51, 38,
       27])

In [37]:
import pandas as pd

In [38]:
numbers = pd.Series(random_numbers, name= "Evaluation Team")

numbers

0     41
1     95
2     19
3     97
4     52
5     16
6     85
7     76
8     56
9     59
10    18
11    71
12    93
13     8
14    83
15    13
16    51
17    76
18    81
19    19
20    32
21    82
22    36
23    54
24    75
25    50
26    31
27    11
28    56
29    81
30    12
31    10
32    51
33    38
34    27
Name: Evaluation Team, dtype: int32

In [39]:
# display the first five rows of the series
numbers.head()


0    41
1    95
2    19
3    97
4    52
Name: Evaluation Team, dtype: int32

In [40]:
# display the first 10 rows of the series
numbers.head(10)

0    41
1    95
2    19
3    97
4    52
5    16
6    85
7    76
8    56
9    59
Name: Evaluation Team, dtype: int32

In [41]:
# display last five rows
numbers.tail()

30    12
31    10
32    51
33    38
34    27
Name: Evaluation Team, dtype: int32

In [42]:
numbers.tail(7)

28    56
29    81
30    12
31    10
32    51
33    38
34    27
Name: Evaluation Team, dtype: int32

### Using the `apply()` Function in a Series

* It's a powerful way to transform and analyze the data within the series.
* Above we have generate a series of random numbers, and created a function called `square` that takes in an int, squares it and return the value. Lets apply that function to the series.

In [43]:
# square the series random numbers 
square(4)

16

In [44]:
# square the series random number
squared_numbers = numbers.apply(square)
squared_numbers

0     1681
1     9025
2      361
3     9409
4     2704
5      256
6     7225
7     5776
8     3136
9     3481
10     324
11    5041
12    8649
13      64
14    6889
15     169
16    2601
17    5776
18    6561
19     361
20    1024
21    6724
22    1296
23    2916
24    5625
25    2500
26     961
27     121
28    3136
29    6561
30     144
31     100
32    2601
33    1444
34     729
Name: Evaluation Team, dtype: int64

In [45]:
# use .rename to rename the series
squared_numbers.rename("Squared Numbers", inplace=True) # inplace = True makes the changes permanent accross the program



0     1681
1     9025
2      361
3     9409
4     2704
5      256
6     7225
7     5776
8     3136
9     3481
10     324
11    5041
12    8649
13      64
14    6889
15     169
16    2601
17    5776
18    6561
19     361
20    1024
21    6724
22    1296
23    2916
24    5625
25    2500
26     961
27     121
28    3136
29    6561
30     144
31     100
32    2601
33    1444
34     729
Name: Squared Numbers, dtype: int64

In [46]:
squared_numbers.head(4)

0    1681
1    9025
2     361
3    9409
Name: Squared Numbers, dtype: int64

### `lambda` function with `apply()`

In [47]:
# Cube the numbers using lambda and apply
cubed_numbers = numbers.apply(lambda k: k **3)
cubed_numbers.head()

0     68921
1    857375
2      6859
3    912673
4    140608
Name: Evaluation Team, dtype: int64

In [48]:
# Cube the numbers using lambda and apply
cubed_numbers = numbers.apply(lambda k: k **3)
cubed_numbers.tail()

30      1728
31      1000
32    132651
33     54872
34     19683
Name: Evaluation Team, dtype: int64

In [49]:
# rename the series
cubed_numbers.rename("Cubed Numbers", inplace=True)

0      68921
1     857375
2       6859
3     912673
4     140608
5       4096
6     614125
7     438976
8     175616
9     205379
10      5832
11    357911
12    804357
13       512
14    571787
15      2197
16    132651
17    438976
18    531441
19      6859
20     32768
21    551368
22     46656
23    157464
24    421875
25    125000
26     29791
27      1331
28    175616
29    531441
30      1728
31      1000
32    132651
33     54872
34     19683
Name: Cubed Numbers, dtype: int64

In [50]:
cubed_numbers.head()

0     68921
1    857375
2      6859
3    912673
4    140608
Name: Cubed Numbers, dtype: int64

### Using the `map()` Function in a series

* Used to substitute each value in a Series with another value creating a convenient way to transform the values in a Series.

In [51]:
def bmi_value(x):

    if x >=1345:
        return "Nonaless"
    
    else:
         return "Umenona"
    
bmi_value(1345)

'Nonaless'

In [52]:
# map our random numbers as Nona or Nonaless
bmi_series = numbers.map(bmi_value)
bmi_series.rename("BMI Series", inplace=True)
bmi_series.head(8)


0    Umenona
1    Umenona
2    Umenona
3    Umenona
4    Umenona
5    Umenona
6    Umenona
7    Umenona
Name: BMI Series, dtype: object

### `lambda` function with `map()`

In [53]:
# use lamda function with map() to double each number
double_number = numbers.map(lambda t: t* 2)
double_number.rename("Doubled Numbers", inplace=True)
double_number.head(5)

0     82
1    190
2     38
3    194
4    104
Name: Doubled Numbers, dtype: int64

### `lamda` function with Conditional Statement

In [54]:
# are the random numbers even or odd
even_odd_series = numbers.apply(lambda k: 'Even' if (k % 2 == 0) else 'Odd')
even_odd_series.rename("Even Odd Series", inplace=True)
even_odd_series.head()

0     Odd
1     Odd
2     Odd
3     Odd
4    Even
Name: Even Odd Series, dtype: object

## Series to DataFrame 

* `if` a **Series** is a *table* with a single column, `elif` a **DataFrame** is a *table* with two or more columns.

In [55]:
import pandas as pd

In [56]:
# converting series into dataframe
mine_df = pd.DataFrame({
    "numbers.name": numbers,
    "cubed_numbers.name" : cubed_numbers,
    "squared_numbers.name" : squared_numbers,
    "bmi_series.name" : bmi_series,
    "even_odd_series": even_odd_series
}) 
mine_df

Unnamed: 0,numbers.name,cubed_numbers.name,squared_numbers.name,bmi_series.name,even_odd_series
0,41,68921,1681,Umenona,Odd
1,95,857375,9025,Umenona,Odd
2,19,6859,361,Umenona,Odd
3,97,912673,9409,Umenona,Odd
4,52,140608,2704,Umenona,Even
5,16,4096,256,Umenona,Even
6,85,614125,7225,Umenona,Odd
7,76,438976,5776,Umenona,Even
8,56,175616,3136,Umenona,Even
9,59,205379,3481,Umenona,Odd


## Knock Yourself Out!

You work as a real estate agent at *MoringaHome Realty*. To assist your clients in making informed decisions about property investment, you decide to analyze property data using Pandas. 
1. Generate 120 random numbers between  Ksh 4000 and Ksh 20,000 using numpy to represent the prices of the houses. 
2. Display the first and last 7 houses.
3. Create a function that will take in the price of the house and return the category of that house, eg Suburb. The category is of your own series.
4. Apply the function created above to the series.
6. Apply a lambda function to increase the property prices by 10% due to the new tax laws.
7. Apply a custom function to increase the property prices by and additional Ksh 250 for garbage. 
8. Create a new Series for each step and Finally Combine them all into a DataFrame name 'Moringa_property'.

In [80]:
#Generating random Price 
random_numbers = np.random.randint(4000, 20000, size=120)
random_numbers

array([10031,  9026,  6613,  7620, 14198, 10022,  6546, 12065, 16651,
        4050, 12118,  5314, 16423, 10252, 19379, 17176,  9805, 18806,
        8468, 18675, 18029,  7722, 15410,  9197,  9445,  6562, 10779,
       17628, 14818,  9710, 12736,  7880,  6413,  7794, 15684,  5049,
       12437,  4565, 12964,  5962,  6568, 10984,  8548, 17015, 17990,
       10571, 16113,  9621, 19049, 18457,  8068, 12295,  4937,  9923,
        6862,  7680,  4179, 18772, 18748, 14749, 13821, 14037,  9870,
       18820, 13673,  9843,  9996,  9432, 19726,  5986, 18829,  9010,
       19934, 19475, 18390,  7321,  8931,  4446, 18287, 15750,  4190,
        7211,  8097, 19559, 18414,  7625, 18392, 13178,  9020, 10498,
       15418,  5792,  6927,  4446, 14125, 15510, 19286,  9291, 18161,
       16079, 17841, 14293,  5691, 14658, 10219, 16363, 17062, 18452,
        4390, 12444,  9975,  9087,  7767, 19552, 16811,  5045, 18785,
       15950,  9799,  5628])

In [81]:
# creating a series
prices = pd.Series(random_numbers, name= "Bei ya Manyumba")
prices

0      10031
1       9026
2       6613
3       7620
4      14198
       ...  
115     5045
116    18785
117    15950
118     9799
119     5628
Name: Bei ya Manyumba, Length: 120, dtype: int32

In [82]:
#Display the first  7 houses
prices.head(7)



0    10031
1     9026
2     6613
3     7620
4    14198
5    10022
6     6546
Name: Bei ya Manyumba, dtype: int32

In [83]:
#Display the last 7 houses
prices.tail(7)

113    19552
114    16811
115     5045
116    18785
117    15950
118     9799
119     5628
Name: Bei ya Manyumba, dtype: int32

In [85]:
#Create a function that will take in the price of the house and return the category of that house
def price_value(x):

    if x >=18000:
        return "No Boma Yetu"
    
    elif x >15000 and x <18000:
        return "Boma Yetu"
    
    else:
         return "Civil Servant Quarter"
    
price_value(1799)

'Civil Servant Quarter'

In [86]:
# Creating the function series
price_series = prices.map(price_value)
price_series.rename("HouseCategories", inplace=True)
price_series.head(12)

0     Civil Servant Quarter
1     Civil Servant Quarter
2     Civil Servant Quarter
3     Civil Servant Quarter
4     Civil Servant Quarter
5     Civil Servant Quarter
6     Civil Servant Quarter
7     Civil Servant Quarter
8                 Boma Yetu
9     Civil Servant Quarter
10    Civil Servant Quarter
11    Civil Servant Quarter
Name: HouseCategories, dtype: object

In [87]:
price_series.tail(6)

114                Boma Yetu
115    Civil Servant Quarter
116             No Boma Yetu
117                Boma Yetu
118    Civil Servant Quarter
119    Civil Servant Quarter
Name: HouseCategories, dtype: object

In [89]:

price_series2 = prices.apply(lambda x: x * 1.10)
price_series2


0      11034.1
1       9928.6
2       7274.3
3       8382.0
4      15617.8
        ...   
115     5549.5
116    20663.5
117    17545.0
118    10778.9
119     6190.8
Name: Bei ya Manyumba, Length: 120, dtype: float64

In [90]:
price_categories = price_series2.map(price_value)
price_categories.rename("HouseCategories", inplace=True)
price_categories.head(5)

0    Civil Servant Quarter
1    Civil Servant Quarter
2    Civil Servant Quarter
3    Civil Servant Quarter
4                Boma Yetu
Name: HouseCategories, dtype: object

In [91]:
# Apply a custom function to increase the property prices by and additional Ksh 250 for garbage. 
def garbage_fee(price):
    return price + 250

In [92]:
price_series2

0      11034.1
1       9928.6
2       7274.3
3       8382.0
4      15617.8
        ...   
115     5549.5
116    20663.5
117    17545.0
118    10778.9
119     6190.8
Name: Bei ya Manyumba, Length: 120, dtype: float64

In [93]:
updated_prices = price_series2.apply(garbage_fee)


In [94]:
updated_prices

0      11284.1
1      10178.6
2       7524.3
3       8632.0
4      15867.8
        ...   
115     5799.5
116    20913.5
117    17795.0
118    11028.9
119     6440.8
Name: Bei ya Manyumba, Length: 120, dtype: float64

In [95]:
New_rates = {
    'Original Price': prices,
    'House Suitability': price_series,
    'House_levy %' : 10,
    'New Price': price_series2,
    'garbage_fee' : 250,
    'updated_prices': updated_prices
}


In [96]:
#Create a new Series for each step and Finally Combine them all into a DataFrame name 'Moringa_property'
Moringa_protery = pd.DataFrame(New_rates)
display(Moringa_protery.head(7))

Unnamed: 0,Original Price,House Suitability,House_levy %,New Price,garbage_fee,updated_prices
0,10031,Civil Servant Quarter,10,11034.1,250,11284.1
1,9026,Civil Servant Quarter,10,9928.6,250,10178.6
2,6613,Civil Servant Quarter,10,7274.3,250,7524.3
3,7620,Civil Servant Quarter,10,8382.0,250,8632.0
4,14198,Civil Servant Quarter,10,15617.8,250,15867.8
5,10022,Civil Servant Quarter,10,11024.2,250,11274.2
6,6546,Civil Servant Quarter,10,7200.6,250,7450.6
