In [4]:
from functools import reduce
arr = [1,2,3,4]

def multiply_array_by_five(array):
    return array * 5 # On map(arr) => [5,10,15,20]

def accumulator(acc, array):
  return acc + array # On reduce(arr) => 10

def is_even(item):
  return item % 2 == 0 # On filter(arr) => [2,4]



---


# Functional Programming
Functional programming is a programming paradigm where functions return values that do not impact the program at large. Each function returns a variable, and it only impacts anything within it's own functions.

Methods that follow this paradigm are shown above.

Multiply\_array\_by\_five() takes in a single array of any length, and multiplies every item within the array by five as shown above.
Due to the use of map, this becomes a function instead of a method - map will return a new list with the results.


**Map**(*function*, *parameter*) => returns a map object that can be casted with list(). As shown below.

**Filter**(*function*,*parameter*) => return a filter object that can be casted
with list() as shown below. Filter requires a boolean or it will return the 
unfiltered list.

In [5]:
print(f'Map: {map(multiply_array_by_five, arr)}') # Show the map
print(f'List(Map): {list(map(multiply_array_by_five, arr))}') # Make it useful
print(f'Filter: {filter(is_even, arr)}') # Show the filter
print(f'List(Filter): {list(filter(is_even, arr))}') # Make the filter useful

# It requires a function that returns a boolean
# Otherwise it doesn't do anything.
print(f'Incorrect Function: {list(filter(multiply_array_by_five, arr))}')

Map: <map object at 0x7f0e31063e20>
List(Map): [5, 10, 15, 20]
Filter: <filter object at 0x7f0e31063df0>
List(Filter): [2, 4]
Incorrect Function: [1, 2, 3, 4]


**Reduce** allows you to accumulate every value of an array into a single item and will return it. In the above method **accumulator**, it takes in the accumumulator and the array. The accumulator is the item that the reduce function needs to store the values, and the array is the array which is acted upon.

**Reduce**(*function*, *parameter*) => Returns a single object that is the culmination of the operation on the array. In this case, the **accumulator** function adds up the array itself.

It is recursive and calls upon itself. Vaguely similar to the code below, but without the use of loops.

In [6]:
sum = 0
for i in arr: # Similar to reduce. Iterate over the array and perform an action.
  sum = sum + i
print(f'Through loops: {sum}')
print(f'Through Reduce: {reduce(accumulator, arr)}') # Return single value (int)
arr_str = ['H','e','l','l','o']
print(f'Reduced str: {reduce(accumulator, arr_str)}') # Return single value (string)
arr_mixed_types = ['3','2',3,2,1] # -> Error. Only works on compatible types.
arr_mixed_num = [1,2,3.2,1,4.352] # -> + works on both float and int, this works
print(f'Float/Int: {reduce(accumulator,arr_mixed_num)}')
# Average of arr
print(f'Average = {reduce(lambda x,y: x+y, arr)/len(arr)}')

Through loops: 10
Through Reduce: 10
Reduced str: Hello
Float/Int: 11.552
Average = 2.5


**Lambda Functions**
Lambda functions are anonymous functions that are used sporadically as and when needed. They are not defined in the traditional syntax of def name(parameters*), they are defined through the use of the syntax

```lambda (var): (function)``` <-- The result of function will be returned.

Lambdas are useful for when there is a function that is used every so often, but not enough to justify a full definition for the functionality.

Some example uses of lambda:

In [7]:
employees = [(3,'Rupali', ['Scala','Python']), 
             (2, 'Alexandra', ['Scala','Python']),
             (1,'Muhammad',['Scala','Python'])]
print(f'As initialised: {employees}')
employees.sort()
print(f'Sorted normally {employees}')
employees.sort(key = lambda x: x[1]) # Sort by name, not by ID
print(f'Sorted by name via lambda: {employees}')

# We can use lambda for the functions above.
# Using lambda and map to add 10 to an array
arr = [1,2,3,4]
print(f'Map Lambda {list(map(lambda x: x+10, arr))}') 
# Or to filter all odd numbers
print(f'Filter Lambda {list(filter(lambda x: x % 2 == 1, arr))}')
# Or to find the product sum of a list
print(f'Reduce Lambda {reduce(lambda x,y : x * y, arr)}')
# We can assign a lambda function to a variable and execute it like a normal fn
lst = 'string'
x = lambda x: x.upper() + '!'
print(x(lst))

As initialised: [(3, 'Rupali', ['Scala', 'Python']), (2, 'Alexandra', ['Scala', 'Python']), (1, 'Muhammad', ['Scala', 'Python'])]
Sorted normally [(1, 'Muhammad', ['Scala', 'Python']), (2, 'Alexandra', ['Scala', 'Python']), (3, 'Rupali', ['Scala', 'Python'])]
Sorted by name via lambda: [(2, 'Alexandra', ['Scala', 'Python']), (1, 'Muhammad', ['Scala', 'Python']), (3, 'Rupali', ['Scala', 'Python'])]
Map Lambda [11, 12, 13, 14]
Filter Lambda [1, 3]
Reduce Lambda 24
STRING!




---


# Pandas


Pandas is a library for python that has a vast array of uses for data handling and analysis, alongside NumPy. 

It is a fast and efficient library that utilises 'dataframes' to handle and manipulate data, with many of the tasks using vectorisation to take advantage of the benefits of CPU architecture.

It provides tools and functionality to handle missing data, reshaping of data, merging and indexing data, and aggregating data.

Many of the core functions are written in C, reducing the technological burden that Python has as a language given it's extremely high-level nature



---


**Core functionality.**

In [None]:
import pandas as pd
import numpy as np
# Create a dataframe through typing values.

data_for_df = {
    'Employee Name' : ['Alexandra', 'Rupali', 'Prosper', 'Devi'],
    'Employee_ID' : [31, 3, 30, 27],
    'Start Date' : ['20230124','20220601','20230124', '20230124']
}
df = pd.DataFrame(data_for_df)
print(df) # Dataframe!

# date range!
d = pd.date_range('20230124',periods=10)
df2 = pd.DataFrame( 
    np.random.randn(10,4), 
    index = d,
    columns = ['Loc','Lat','Lon','Name']
)
print(df2)

# DF can also be created through read_csv(), read_json(), read_pickle()

  Employee Name  Employee_ID Start Date
0     Alexandra           31   20230124
1        Rupali            3   20220601
2       Prosper           30   20230124
3          Devi           27   20230124
                 Loc       Lat       Lon      Name
2023-01-24  0.092959  2.151293 -0.477784 -0.859796
2023-01-25 -0.267840  0.095101  0.504584  1.303535
2023-01-26  1.018012 -0.626398 -0.266261  0.106551
2023-01-27 -1.484401  0.710981  0.977980 -0.153814
2023-01-28 -0.274777  0.624188 -0.187097 -1.996437
2023-01-29 -0.298495  0.097368 -0.032207  1.226168
2023-01-30 -0.988365 -1.946977 -0.097254  1.637589
2023-01-31  0.612816  0.826229  0.109971  0.798513
2023-02-01 -1.004896  0.135518 -0.221314 -0.301259
2023-02-02 -0.121610  1.109123  0.767189  2.206645


In [None]:
# Locating and splicing data.

df.loc[0:1, 'Employee Name']  # Find "Employee Name" with values 0:1
                              # It uses the label of the columns.
print(f'DF Index: {df.index}') # Can be spliced. 
print()
print(f'DF2 Index: {df2.index}') # Data indices
print()
# Collect the first three indices from columns 0:2
print(f'Rows 0:3, Columns 0:2 \n{df.iloc[0:3,0:2]}') 
print()
print(f'Columns in df: {df.columns}')
print()

# Boolean testing through df[df['column](condition)]
print(f"""Print info for Alexandra or Rupali, whoever is first \n 
      {df[df['Employee Name'] == ('Alexandra' or 'Rupali')]}
      """) 


DF Index: RangeIndex(start=0, stop=4, step=1)

DF2 Index: DatetimeIndex(['2023-01-24', '2023-01-25', '2023-01-26', '2023-01-27',
               '2023-01-28', '2023-01-29', '2023-01-30', '2023-01-31',
               '2023-02-01', '2023-02-02'],
              dtype='datetime64[ns]', freq='D')

Rows 0:3, Columns 0:2 
  Employee Name  Employee_ID
0     Alexandra           31
1        Rupali            3
2       Prosper           30

Columns in df: Index(['Employee Name', 'Employee_ID', 'Start Date'], dtype='object')

Print info for Alexandra or Rupali, whoever is first 
 
        Employee Name  Employee_ID Start Date
0     Alexandra           31   20230124
      


In [None]:
df2.reindex(index=d[0:4], columns=list(df2.columns) + ['F'])

df2.loc[d[0]:d[2], 'F'] = 1 # Change F at 0:2 to 1
print(df2)
df2.isna()

In [None]:
df2_no_null = df2.dropna() # Returns a new df without nulls
print(df2_no_null)  # No nulls!
df2_filled_null = df2.fillna(value=2) # Fill in the nulls. Returns new df
print(df2_filled_null)


                 Loc       Lat       Lon      Name    F
2023-01-24  0.475749  0.741898  0.366450  0.576687  1.0
2023-01-25 -1.747522  0.205583  0.712716 -0.259162  1.0
2023-01-26  0.084477 -0.609408  0.803665  0.123290  1.0
                 Loc       Lat       Lon      Name    F
2023-01-24  0.475749  0.741898  0.366450  0.576687  1.0
2023-01-25 -1.747522  0.205583  0.712716 -0.259162  1.0
2023-01-26  0.084477 -0.609408  0.803665  0.123290  1.0
2023-01-27 -1.239372 -0.662328 -1.074490  1.706764  2.0
2023-01-28  0.130732 -1.912783  0.168224 -0.388228  2.0
2023-01-29 -1.073162 -0.244553 -2.294275  0.554815  2.0
2023-01-30 -1.420320 -0.812712  1.182256  0.749206  2.0
2023-01-31 -0.348578 -0.033314 -0.188314  0.453793  2.0
2023-02-01  0.864936  1.181063  0.867173  0.825221  2.0
2023-02-02  0.863400 -1.335263  1.753393  0.335005  2.0


NOHTYP
