# Variables

Variables are containers. Variables can contain numbers, words, result of operations, list of those, and any other programming object. As soon as you store an object into a variable, every time you need it for further operations you can use the name of that variable to retrieve it.

In [1]:
# Initialising variables with words
name = "Stefano"
surname = "Rapisarda"

# Initialising variables with numbers
age = 25

# Initialising variables with result of operations
full_name = name + ' ' + surname

# Initialising variables with list of items
hobbies = ["music","photography","cooking","yoyoing"]

# Functions

Functions in programming perform actions. In `python` you can recognise functions asn they are usually called with vers and they always have round parenthesis. For example, `print()` is a function. Functions always perform an action and return a result. Functions also accept parameters, i.e. variables that they use to perform a particular action.

In [3]:
# Using a function with a single argument
print(full_name)

Stefano Rapisarda


In [6]:
# Using a functions that returns a result
len(hobbies)

4

In [8]:
# Storing the result of a function into a variable. Using that variable as parameter for another function
n_hobbies = len(hobbies)
print('Stefano has',len(hobbies),'hobbies')

Stefano has 4 hobbies


# Loops

Machines are very good on repeating the same thing over and over. In programming, you can repeating a single action or a block of action using loops. There are several kinds of loops and the `for` loop is one of those.

In [9]:
# Loop demonstration
for step in range(0,10):
    print(step)

0
1
2
3
4
5
6
7
8
9


The function `range(0,10)` creates a sequence of integer (without decimals) numbers from 0 to 9. Using the words `for` and `in`, when can *iterate* through these numbers. The syntax `for step in range(0,10)` will repeat instructions 10 times and for each time instructions are repeated, the variable `step` will take an increasing number from 0 to 10. In this way, in the first iteration `step` will be 0, in the second iteration it will be 1, and so on. All the instructions with an indend following the for loop (in our case, just the function `print(step)`) will be repeated until the for loop is done.

In [14]:
# Operations inside and ourside the for loop
print("Stefano's hobbies:")
print('-'*30)
for i in range(len(hobbies)):
    print(hobbies[i])
print('-'*30)

Stefano's hobbies:
------------------------------
music
photography
cooking
yoyoing
------------------------------


# Conditional Statements (IF ... ELSE ...)
In programming you will often find yourself in the situation of chacking if a variable is equal to a specific number or words. For example, among Stefano's hobbies, we want to check if "music" is one of those. To perform these kind of check we will use conditional statements. In programming, conditional statements check if a condition is true or not. After the condition is checked, we can specify an operation or a block of operations that will be executed if either the condition is true or not.

In [16]:
for i in range(len(hobbies)):
    hobby = hobbies[i]
    if hobby == "music":
        print(hobby, " -> Wow, I also like music!")
    else:
        print(hobby, " -> Oh, I don't really like this hobby")

music  -> Wow, I also like music!
photography  -> Oh, I don't really like this hobby
cooking  -> Oh, I don't really like this hobby
yoyoing  -> Oh, I don't really like this hobby


# Python packages (Library)
One of the advantages of using a programing language as popular as Python, is that its enourmous programming community alreasy solved a lot of problems for you. Indeed, while working on your projects, you can create your own functions and object, and collect them in packages that people can download and use. Packages are like libraries in the real world. In a real library you access information collected and organised by other people. Using python packages, you can use functions developed by other programmers to solve any kind of task. There are packages for plotting data, to perform high precision mathematical calculations, to build user interfaes, to build website, to design and train large language models. In our case, we will use the package `pandas`, a collection of tools specific for data science. We will use `pandas` functionalities to solve three tasks. 

In [20]:
# Importing the library into python
import pandas as pd

## Task 1: Clean up the data (remove the NaN)

### Step by step coding
- read the dataframe into a variable
- create a copy of the DataFrame
- store the value of the first row and column into a variable
- check if the variable is nan
- if the variable is nan, change is value (and print a message for storing)
- iterate through all columns and rows
- write the cleaned DataFrame to a file
- df_cleaned.to_csv("library_employees_clean.csv", index=False)

**Knowledge required**
- variables
- functions and methods
- package (library)
- loops

In [23]:
# A simple solution
df_missing = pd.read_csv("../data/library_employees.csv")
df_cleaned =  df_missing.copy()
for index in df_missing.index:
    for column in df_missing.columns:
        value = df_missing.loc[index,column]
        if pd.isna(value):
            df_cleaned.loc[index,column] = "unknown"
df_cleaned
#df_cleaned.to_csv("library_employees_clean.csv", index=False)

Unnamed: 0,Name,Surname,Nationality,Role,Skills,Favourite Planet
0,Danielle,Johnson,Falkland Islands (Malvinas),unknown,"Metadata Tagging, Rare Book Handling",Mars
1,John,Taylor,Bosnia and Herzegovina,Archivist,"Rare Book Handling, Customer Service, AI-based...",Venus
2,Erica,Mcclain,unknown,Digital Preservationist,"Data Management, 3D Printing, AI-based Cataloging",Mars
3,Brittany,Johnson,Bermuda,Archivist,"Data Management, Metadata Tagging, Rare Book H...",Uranus
4,Jeffery,unknown,Anguilla,Archivist,unknown,Earth
5,Anna,Baldwin,Reunion,Research Assistant,"Metadata Tagging, Rare Book Handling, AI-based...",Saturn
6,Amy,Robinson,Namibia,Librarian,"AI-based Cataloging, Data Management",Saturn
7,Joshua,Booth,Senegal,Cataloguer,"Metadata Tagging, AI-based Cataloging, Data Ma...",Pluto
8,Linda,Wolfe,Guam,Librarian,"Rare Book Handling, Metadata Tagging, Coding",Saturn
9,Joshua,unknown,Serbia,Digital Preservationist,"Rare Book Handling, Data Management",Mars


In [26]:
# A PRO solution
df_missing = pd.read_csv("../data/library_employees.csv")
df_cleaned = df_missing.fillna("unknown")
#df_cleaned.to_csv("library_employees_with_missing.csv", index=False)
df_cleaned

Unnamed: 0,Name,Surname,Nationality,Role,Skills,Favourite Planet
0,Danielle,Johnson,Falkland Islands (Malvinas),unknown,"Metadata Tagging, Rare Book Handling",Mars
1,John,Taylor,Bosnia and Herzegovina,Archivist,"Rare Book Handling, Customer Service, AI-based...",Venus
2,Erica,Mcclain,unknown,Digital Preservationist,"Data Management, 3D Printing, AI-based Cataloging",Mars
3,Brittany,Johnson,Bermuda,Archivist,"Data Management, Metadata Tagging, Rare Book H...",Uranus
4,Jeffery,unknown,Anguilla,Archivist,unknown,Earth
5,Anna,Baldwin,Reunion,Research Assistant,"Metadata Tagging, Rare Book Handling, AI-based...",Saturn
6,Amy,Robinson,Namibia,Librarian,"AI-based Cataloging, Data Management",Saturn
7,Joshua,Booth,Senegal,Cataloguer,"Metadata Tagging, AI-based Cataloging, Data Ma...",Pluto
8,Linda,Wolfe,Guam,Librarian,"Rare Book Handling, Metadata Tagging, Coding",Saturn
9,Joshua,unknown,Serbia,Digital Preservationist,"Rare Book Handling, Data Management",Mars
