# Introduction to Pandas Series and DataFrames

## Objectives

* Understand Pandas Series and DataFrames
* Creating Series and DataFrames
* Basic Operations with Series 
* Exploring DataFrame Basics
* Selecting Data from DataFrames
* Applying Functions to Series and DataFrames

## Loading Libraries

In [6]:
# numpys - for arithmetic operations and high-level mathematical functions to operate on arrays
import numpy as np
# pandas - for working with relational or labeled data
import pandas as pd
data = [10, 20, 30, 40, 50]
labels = ['A', 'B', 'C', 'D', 'E']

# Create a Series with data and custom labels
series = pd.Series(data, index=labels)


## What is a Pandas Series?

* **One-Dimensional** labeled Array capable of holding data on any type such as *intergers*, *string*, *float*, *python objects* etc.
* A pandas series is like a column in a table.


### Key features of a Pandas Series

* **Homogeneous Data**: A Series Holds Data of a single data type(integer, float, string etc), ensuring homogeneity within the Series.
* **Labeled Index**: Each element in a Series is associated with a label called an *index*. Having unique labels is a common practice, though not strictly required. The labels just need to be hashable types, ie they need to be used as keys in a dictionary. This index allows for easy and efficient data retrieval and manipulation.
* **Vectorized Operations**: - Series support vectorized operations, ie you can apply operations to the entire series without the need for explicit loops.
* **Alignment of Data**: - When performing operations on a Series, Pandas automatically aligns data based on index labels, which simplifies data manipulation.
* **Creation**: - Can be created from a List, NumpyArrays, Dictionary, DataFrame slice and other data sources. 

In [7]:
# example of a series from a list 
marks = [23, 19, 45, 67, 98, 100, 32, 76 ]


#series
marks_series = pd.Series(marks)
marks_series

0     23
1     19
2     45
3     67
4     98
5    100
6     32
7     76
dtype: int64

## Creating and Displaying

In [8]:
#example 1 Creating a series from a list
data = [3.09, 3.7, 2.4, 1.234, 8.9, 3.4, 5]

list_series = pd.Series(data, name="Composite Scores")
list_series

0    3.090
1    3.700
2    2.400
3    1.234
4    8.900
5    3.400
6    5.000
Name: Composite Scores, dtype: float64

In [9]:
type(list_series)

pandas.core.series.Series

In [10]:
# Creaming a series from numPy Array
data_arr = np.array(data) #crated an array from list

type(data_arr)

numpy.ndarray

In [11]:
#series from array
arr_series = pd.Series(data_arr, name="Array")

arr_series



0    3.090
1    3.700
2    2.400
3    1.234
4    8.900
5    3.400
6    5.000
Name: Array, dtype: float64

In [12]:
# series from dictonary
data_dict={
    "Nakuru" : 3.09,
    "Kisumu" : 3.7,
    "Nyeri" : 2.4,
    "Mombasa" : 1.234,
    "Eldoret" : 8.9,
    "Meru" : 3.4,
    "Nairobi" : 5

}

type(data_dict)

dict

In [13]:
dict_series= pd.Series(data_dict, name= "COA Sores")
dict_series

Nakuru     3.090
Kisumu     3.700
Nyeri      2.400
Mombasa    1.234
Eldoret    8.900
Meru       3.400
Nairobi    5.000
Name: COA Sores, dtype: float64

In [14]:

Percent_score = [100, 65, 78, 34, 90, 86, 20]
custom_labels = [1, 2, 3, 4, 5, 6, 7]

# Create a pandas Series with custom labels and a specified name
custom_labels = pd.Series(data=Percent_score, index=custom_labels, name="COA Marks")

# Display the resulting Series
custom_labels


1    100
2     65
3     78
4     34
5     90
6     86
7     20
Name: COA Marks, dtype: int64

In [15]:
Percent_score = [100, 65, 78, 34, 90, 86, 20]
custom_labels = (data_dict)

# Create a pandas Series with custom labels and a specified name
custom_labels = pd.Series(data=Percent_score, index=custom_labels, name="COA Marks")

# Display the resulting Series
custom_labels


Nakuru     100
Kisumu      65
Nyeri       78
Mombasa     34
Eldoret     90
Meru        86
Nairobi     20
Name: COA Marks, dtype: int64

## Basic Operations With Series

In [16]:
arr_series

0    3.090
1    3.700
2    2.400
3    1.234
4    8.900
5    3.400
6    5.000
Name: Array, dtype: float64

In [17]:
print(arr_series[4])

8.9


In [18]:
# Example 1  Access elements in a s series

elements = custom_labels[4]
print(elements)

90


In [19]:
dict_series

Nakuru     3.090
Kisumu     3.700
Nyeri      2.400
Mombasa    1.234
Eldoret    8.900
Meru       3.400
Nairobi    5.000
Name: COA Sores, dtype: float64

In [20]:
print(dict_series["Kisumu"])

3.7


In [21]:
custom_labels

Nakuru     100
Kisumu      65
Nyeri       78
Mombasa     34
Eldoret     90
Meru        86
Nairobi     20
Name: COA Marks, dtype: int64

In [22]:
print(custom_labels['Nakuru':'Eldoret'])

Nakuru     100
Kisumu      65
Nyeri       78
Mombasa     34
Eldoret     90
Name: COA Marks, dtype: int64


In [23]:
#arithematic operations
# convert in percentage
x = custom_labels / 100
x


Nakuru     1.00
Kisumu     0.65
Nyeri      0.78
Mombasa    0.34
Eldoret    0.90
Meru       0.86
Nairobi    0.20
Name: COA Marks, dtype: float64

In [24]:
x = x * 100
x

Nakuru     100.0
Kisumu      65.0
Nyeri       78.0
Mombasa     34.0
Eldoret     90.0
Meru        86.0
Nairobi     20.0
Name: COA Marks, dtype: float64

In [25]:
#filter the elements
x_filtered = x[x >= 78]
x_filtered


Nakuru     100.0
Nyeri       78.0
Eldoret     90.0
Meru        86.0
Name: COA Marks, dtype: float64

In [26]:
# summary statistics
custom_labels

Nakuru     100
Kisumu      65
Nyeri       78
Mombasa     34
Eldoret     90
Meru        86
Nairobi     20
Name: COA Marks, dtype: int64

In [27]:
mean = x.mean()
print(mean)

67.57142857142857


In [28]:
std = x.std()
print(std)

29.999206338708046


In [29]:
max = x.max()
print(max)

100.0


## Applying Functions to a Series 

### Lambda Functions

* Small anonymous function that is not bound to an identifier.
* Similar to user defined functions but without a name.
* It's simple and straightfoward, requiring only the argument(s) and expression, alongside the keyword `lambda`.
* They require only one line of code.

```
def func_name(parameters):
    code block
    
    return return_value
```

`func = lamda parameters: return_value`

* `lambda` : Keyword that indicates definition of a lambda function.
* `parameters`: The input parameters that the lambda function will take.
* `return_value`: A single expression that defines the compuation the lambda function performs and its return value 

In [30]:
# Compare the two
def square(x):

    out = x ** 2

    return out
square(7)

49

In [31]:
# labda function
square_lambda = lambda x: x ** 2

square_lambda(8)

64

In [32]:
# lambda function to print Hello world
def print_hello():
    y = 'hello Universe'

    return y

print_hello()

'hello Universe'

In [33]:
letter = lambda: "hello Universe"

letter()



'hello Universe'

In [34]:
Opening_statement = lambda: "Join the clan"

Opening_statement()

'Join the clan'

In [35]:
def even_odd(number: int):
    """ check if a number is even or odd """
    if   (number% 2 == 0):
        return 'Even'
    else:
        return 'Odd'
    
even_odd(75)

'Odd'

In [36]:
even_lambda = lambda number: "Even" if (number %2 ==0) else "Odd" 

even_lambda(76)

'Even'

In [37]:
even_lambda = lambda number: "Even" if (number %2 ==0) else "Odd" 

even_lambda(79)

'Odd'

### Generate Random Numbers

Using NumPy library to generate random Numbers.

In [38]:
import numpy as np

In [39]:
random_numbers = np.random.randint(1, 100, size=35)

In [40]:
# how to display the generated numbers
random_numbers

array([ 3, 99, 21, 44, 94,  3, 65, 58, 37,  1, 64,  5, 72, 21, 13, 90, 81,
       67, 12, 22,  1, 72, 84, 90, 86, 73, 39, 47, 33, 30, 92, 71, 25, 20,
       69])

In [41]:
import pandas as pd

In [42]:
numbers = pd.Series(random_numbers, name= "Evaluation Team")

numbers

0      3
1     99
2     21
3     44
4     94
5      3
6     65
7     58
8     37
9      1
10    64
11     5
12    72
13    21
14    13
15    90
16    81
17    67
18    12
19    22
20     1
21    72
22    84
23    90
24    86
25    73
26    39
27    47
28    33
29    30
30    92
31    71
32    25
33    20
34    69
Name: Evaluation Team, dtype: int32

In [43]:
# display the first five rows of the series
numbers.head()


0     3
1    99
2    21
3    44
4    94
Name: Evaluation Team, dtype: int32

In [44]:
# display the first 10 rows of the series
numbers.head(10)

0     3
1    99
2    21
3    44
4    94
5     3
6    65
7    58
8    37
9     1
Name: Evaluation Team, dtype: int32

In [45]:
# display last five rows
numbers.tail()

30    92
31    71
32    25
33    20
34    69
Name: Evaluation Team, dtype: int32

In [46]:
numbers.tail(7)

28    33
29    30
30    92
31    71
32    25
33    20
34    69
Name: Evaluation Team, dtype: int32

### Using the `apply()` Function in a Series

* It's a powerful way to transform and analyze the data within the series.
* Above we have generate a series of random numbers, and created a function called `square` that takes in an int, squares it and return the value. Lets apply that function to the series.

In [47]:
# square the series random numbers 
square(4)

16

In [48]:
# square the series random number
squared_numbers = numbers.apply(square)
squared_numbers

0        9
1     9801
2      441
3     1936
4     8836
5        9
6     4225
7     3364
8     1369
9        1
10    4096
11      25
12    5184
13     441
14     169
15    8100
16    6561
17    4489
18     144
19     484
20       1
21    5184
22    7056
23    8100
24    7396
25    5329
26    1521
27    2209
28    1089
29     900
30    8464
31    5041
32     625
33     400
34    4761
Name: Evaluation Team, dtype: int64

In [49]:
# use .rename to rename the series
squared_numbers.rename("Squared Numbers", inplace=True) # inplace = True makes the changes permanent accross the program



0        9
1     9801
2      441
3     1936
4     8836
5        9
6     4225
7     3364
8     1369
9        1
10    4096
11      25
12    5184
13     441
14     169
15    8100
16    6561
17    4489
18     144
19     484
20       1
21    5184
22    7056
23    8100
24    7396
25    5329
26    1521
27    2209
28    1089
29     900
30    8464
31    5041
32     625
33     400
34    4761
Name: Squared Numbers, dtype: int64

In [50]:
squared_numbers.head(4)

0       9
1    9801
2     441
3    1936
Name: Squared Numbers, dtype: int64

### `lambda` function with `apply()`

In [51]:
# Cube the numbers using lambda and apply
cubed_numbers = numbers.apply(lambda k: k **3)
cubed_numbers.head()

0        27
1    970299
2      9261
3     85184
4    830584
Name: Evaluation Team, dtype: int64

In [52]:
# Cube the numbers using lambda and apply
cubed_numbers = numbers.apply(lambda k: k **3)
cubed_numbers.tail()

30    778688
31    357911
32     15625
33      8000
34    328509
Name: Evaluation Team, dtype: int64

In [53]:
# rename the series
cubed_numbers.rename("Cubed Numbers", inplace=True)

0         27
1     970299
2       9261
3      85184
4     830584
5         27
6     274625
7     195112
8      50653
9          1
10    262144
11       125
12    373248
13      9261
14      2197
15    729000
16    531441
17    300763
18      1728
19     10648
20         1
21    373248
22    592704
23    729000
24    636056
25    389017
26     59319
27    103823
28     35937
29     27000
30    778688
31    357911
32     15625
33      8000
34    328509
Name: Cubed Numbers, dtype: int64

In [54]:
cubed_numbers.head()

0        27
1    970299
2      9261
3     85184
4    830584
Name: Cubed Numbers, dtype: int64

### Using the `map()` Function in a series

* Used to substitute each value in a Series with another value creating a convenient way to transform the values in a Series.

In [55]:
def bmi_value(x):

    if x >=1345:
        return "Nonaless"
    
    else:
         return "Umenona"
    
bmi_value(1345)

'Nonaless'

In [56]:
# map our random numbers as Nona or Nonaless
bmi_series = numbers.map(bmi_value)
bmi_series.rename("BMI Series", inplace=True)
bmi_series.head(8)


0    Umenona
1    Umenona
2    Umenona
3    Umenona
4    Umenona
5    Umenona
6    Umenona
7    Umenona
Name: BMI Series, dtype: object

### `lambda` function with `map()`

In [57]:
# use lamda function with map() to double each number
double_number = numbers.map(lambda t: t* 2)
double_number.rename("Doubled Numbers", inplace=True)
double_number.head(5)

0      6
1    198
2     42
3     88
4    188
Name: Doubled Numbers, dtype: int64

### `lamda` function with Conditional Statement

In [58]:
# are the random numbers even or odd
even_odd_series = numbers.apply(lambda k: 'Even' if (k % 2 == 0) else 'Odd')
even_odd_series.rename("Even Odd Series", inplace=True)
even_odd_series.head()

0     Odd
1     Odd
2     Odd
3    Even
4    Even
Name: Even Odd Series, dtype: object

## Series to DataFrame 

* `if` a **Series** is a *table* with a single column, `elif` a **DataFrame** is a *table* with two or more columns.

In [59]:
import pandas as pd

In [60]:
# converting series into dataframe
mine_df = pd.DataFrame({
    "numbers.name": numbers,
    "cubed_numbers.name" : cubed_numbers,
    "squared_numbers.name" : squared_numbers,
    "bmi_series.name" : bmi_series,
    "even_odd_series": even_odd_series
}) 
mine_df

Unnamed: 0,numbers.name,cubed_numbers.name,squared_numbers.name,bmi_series.name,even_odd_series
0,3,27,9,Umenona,Odd
1,99,970299,9801,Umenona,Odd
2,21,9261,441,Umenona,Odd
3,44,85184,1936,Umenona,Even
4,94,830584,8836,Umenona,Even
5,3,27,9,Umenona,Odd
6,65,274625,4225,Umenona,Odd
7,58,195112,3364,Umenona,Even
8,37,50653,1369,Umenona,Odd
9,1,1,1,Umenona,Odd


## Knock Yourself Out!

You work as a real estate agent at *MoringaHome Realty*. To assist your clients in making informed decisions about property investment, you decide to analyze property data using Pandas. 
1. Generate 120 random numbers between  Ksh 4000 and Ksh 20,000 using numpy to represent the prices of the houses. 
2. Display the first and last 7 houses.
3. Create a function that will take in the price of the house and return the category of that house, eg Suburb. The category is of your own series.
4. Apply the function created above to the series.
6. Apply a lambda function to increase the property prices by 10% due to the new tax laws.
7. Apply a custom function to increase the property prices by and additional Ksh 250 for garbage. 
8. Create a new Series for each step and Finally Combine them all into a DataFrame name 'Moringa_property'.

In [61]:
#Generating random Price 
random_numbers = np.random.randint(4000, 20000, size=120)
random_numbers

array([17482, 12363,  5245, 14598,  4889,  5923, 19601, 12339,  9791,
       10699, 14358, 12704, 14819,  4291, 13790,  4073, 14061, 19529,
        4679,  9727,  9848, 17364, 19988,  9365, 19480,  6357, 18749,
       18005, 19245, 18422, 10578,  4864, 14920, 10113, 18789, 15368,
        9214, 13021, 18543,  5080, 13107, 19395,  9713, 19122, 17843,
        4143, 18683, 19150,  8161, 13436, 15146, 17799, 15481, 15864,
       14239,  9200, 11103,  9336, 19200, 10032,  7250,  9635, 14147,
       18574,  4514, 11074,  8386, 13749, 11672,  6592,  5828, 19745,
       11537,  9525, 17744,  7350, 18855,  5404,  9879,  6394, 17509,
        7740, 11478, 18449,  6442, 13686, 19371, 10341,  6485,  4421,
       14129,  8453,  8233,  9757, 11422,  4806,  9860, 15471, 15036,
        4550,  6104, 18836,  5999,  5307, 12927, 15773,  9560, 19324,
       10816,  6085,  6626, 13298, 15301, 10639, 17653, 13002,  9732,
       13712, 14304, 10754])

In [62]:
# creating a series
prices = pd.Series(random_numbers, name= "Bei ya Manyumba")
prices

0      17482
1      12363
2       5245
3      14598
4       4889
       ...  
115    13002
116     9732
117    13712
118    14304
119    10754
Name: Bei ya Manyumba, Length: 120, dtype: int32

In [63]:
#Display the first  7 houses
prices.head(7)



0    17482
1    12363
2     5245
3    14598
4     4889
5     5923
6    19601
Name: Bei ya Manyumba, dtype: int32

In [64]:
#Display the last 7 houses
prices.tail(7)

113    10639
114    17653
115    13002
116     9732
117    13712
118    14304
119    10754
Name: Bei ya Manyumba, dtype: int32

In [65]:
#Create a function that will take in the price of the house and return the category of that house
def price_value(x):

    if x >=18000:
        return "No Boma Yetu"
    
    elif x >15000 and x <18000:
        return "Boma Yetu"
    
    else:
         return "Civil Servant Quarter"
    
price_value(17999)

'Boma Yetu'

In [66]:
# Creating the function series
price_series = prices.map(price_value)
price_series.rename("HouseCategories", inplace=True)
price_series.head(12)

0                 Boma Yetu
1     Civil Servant Quarter
2     Civil Servant Quarter
3     Civil Servant Quarter
4     Civil Servant Quarter
5     Civil Servant Quarter
6              No Boma Yetu
7     Civil Servant Quarter
8     Civil Servant Quarter
9     Civil Servant Quarter
10    Civil Servant Quarter
11    Civil Servant Quarter
Name: HouseCategories, dtype: object

In [67]:
price_series.tail(6)

114                Boma Yetu
115    Civil Servant Quarter
116    Civil Servant Quarter
117    Civil Servant Quarter
118    Civil Servant Quarter
119    Civil Servant Quarter
Name: HouseCategories, dtype: object

In [68]:

price_series2 = numbers.apply(lambda x: x * 1.10)
price_series2


0       3.3
1     108.9
2      23.1
3      48.4
4     103.4
5       3.3
6      71.5
7      63.8
8      40.7
9       1.1
10     70.4
11      5.5
12     79.2
13     23.1
14     14.3
15     99.0
16     89.1
17     73.7
18     13.2
19     24.2
20      1.1
21     79.2
22     92.4
23     99.0
24     94.6
25     80.3
26     42.9
27     51.7
28     36.3
29     33.0
30    101.2
31     78.1
32     27.5
33     22.0
34     75.9
Name: Evaluation Team, dtype: float64

In [69]:
price_categories = price_series2.map(price_value)
price_categories.rename("HouseCategories", inplace=True)
price_categories.head(5)

0    Civil Servant Quarter
1    Civil Servant Quarter
2    Civil Servant Quarter
3    Civil Servant Quarter
4    Civil Servant Quarter
Name: HouseCategories, dtype: object

In [70]:
# Apply a custom function to increase the property prices by and additional Ksh 250 for garbage. 
def garbage_fee(price):
    return price + 250

In [71]:
price_series2

0       3.3
1     108.9
2      23.1
3      48.4
4     103.4
5       3.3
6      71.5
7      63.8
8      40.7
9       1.1
10     70.4
11      5.5
12     79.2
13     23.1
14     14.3
15     99.0
16     89.1
17     73.7
18     13.2
19     24.2
20      1.1
21     79.2
22     92.4
23     99.0
24     94.6
25     80.3
26     42.9
27     51.7
28     36.3
29     33.0
30    101.2
31     78.1
32     27.5
33     22.0
34     75.9
Name: Evaluation Team, dtype: float64

In [72]:
updated_prices = price_series2.apply(garbage_fee)


In [73]:
updated_prices

0     253.3
1     358.9
2     273.1
3     298.4
4     353.4
5     253.3
6     321.5
7     313.8
8     290.7
9     251.1
10    320.4
11    255.5
12    329.2
13    273.1
14    264.3
15    349.0
16    339.1
17    323.7
18    263.2
19    274.2
20    251.1
21    329.2
22    342.4
23    349.0
24    344.6
25    330.3
26    292.9
27    301.7
28    286.3
29    283.0
30    351.2
31    328.1
32    277.5
33    272.0
34    325.9
Name: Evaluation Team, dtype: float64

In [74]:
New_rates = {
    'prices': prices,
    'price_series': price_series,
    'house_levy %' : 10,
    'price_series2': price_series2,
    'garbage_fee' : 250,
    'updated_prices': updated_prices
}


In [78]:
#Create a new Series for each step and Finally Combine them all into a DataFrame name 'Moringa_property'
Moringa_protery = pd.DataFrame(New_rates)
display(Moringa_protery)

Unnamed: 0,prices,price_series,house_levy %,price_series2,garbage_fee,updated_prices
0,17482,Boma Yetu,10,3.3,250,253.3
1,12363,Civil Servant Quarter,10,108.9,250,358.9
2,5245,Civil Servant Quarter,10,23.1,250,273.1
3,14598,Civil Servant Quarter,10,48.4,250,298.4
4,4889,Civil Servant Quarter,10,103.4,250,353.4
...,...,...,...,...,...,...
115,13002,Civil Servant Quarter,10,,250,
116,9732,Civil Servant Quarter,10,,250,
117,13712,Civil Servant Quarter,10,,250,
118,14304,Civil Servant Quarter,10,,250,
