### Credits:

<img align="left" src="https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/CC_BY.png"><br />

This notebook is created by Zhuo Chen based on the notebooks created by [Nathan Kelber](http://nkelber.com) under [Creative Commons CC BY License](https://creativecommons.org/licenses/by/4.0/)<br />
For questions/comments/improvements, email zhuo.chen@ithaka.org or nathan.kelber@ithaka.org<br />

Reused and modified for internal use at Università Cattolica del Sacro Cuore di Milano, by Deborah Grbac, email deborah.grbac@unicatt.it and Valentina Schiariti, email valentina.schiariti-collaboratore@unicatt.it, released under CC BY License.

This repository is founded on **Constellate notebooks**. The original Jupyter notebooks repository was designed by the educators at **ITHAKA's Constellate project**. The project was sunset on July 1, 2025. This current repository uses and resuses Constellate notebooks as Open Educational Resources (OER), free for re-use under a Creative Commons CC BY License.
___

# Python Basics 4

**Description:** 
This lesson will describe:  
* What are functions;
* How to import external libraries and modules;
* How to write your own functions (`def` statements, Local scope and global scope);
* The importance of functions and why to use them;
* Some popular Python packages and their usage.

This is part 4 of 5 in the series *Python Basics* that will prepare you to do text analysis using the Python programming language. 

**Note**: Running this notebook locally will give you full control to test, modify, and save your work. We strongly recommend downloading it before you begin.
___

## Functions

A **function** is a reusable **block of code** that performs a **specific task**. It can take **input values** (called **arguments**), process them, and optionally **return an output**. We have used several **Python functions** already, including `print()`, `input()`, and `range()`. 

You can identify a function by the fact that it ends with a set of parentheses `()` where the argument(s) can be passed into the function. Depending on the function(and your goals for using it), a function may accept no arguments, a single argument, or many arguments. For example, when we use the `print()` function, we pass a string (or a variable containing a string) as an argument. For more details, you can refer to the official Python documentation for the function you’re interested in (here’s the link for the [`print()` function](https://docs.python.org/3/library/functions.html#print)).

### Libraries and Modules

While Python comes with many functions, there are thousands more that others have written. Adding them all to Python would create mass confusion, since many people could use the same name for functions that do different things. The solution then is that **functions are stored in modules**  that can be **imported** for use. A **module** is a Python file (extension ".py") that contains the definitions for the functions written in Python. These modules (individual Python files) can then be collected into even larger groups called **packages and libraries**. Depending on how many functions you need for the program you are writing, you may import a single module, a package of modules, or a whole library.

The general form of importing a module is:
`import module_name`

For example, here's how you can import the **`time` module** and use the `sleep()` function that will delay the code completion for a set amount of time, in this case 5 seconds.

In [1]:
# A program that waits five seconds then prints "Done"

import time # We import all the functions in the `time` module

print('Waiting 5 seconds...')

time.sleep(5) # We run the sleep() function from the time module using `time.sleep()`
print('Done')

Waiting 5 seconds...
Done


We can also just import the `sleep()` function without importing the whole `time` module. The syntax is:

`from module import function`

In [2]:
# A program that waits five seconds then prints "Done"

from time import sleep # We import just the sleep() function from the time module

print('Waiting 5 seconds...')

sleep(5) # Notice that we just call the sleep() function, not time.sleep()
print('Done')

Waiting 5 seconds...
Done


Notice the **difference in syntax** depending on whether you import the entire module or just a single function:

* Importing the whole module: the function needs to be called with the module name first (`module_name.module_function()`);
* Importing a single function: the function is used on its own (`module_function()`)

## Writing a Function

In the above examples, we called a function  that was already written. However, we can also create our own functions!

The first step is to **define the function** before we call it. We use a **function definition statement** followed by a **function description** and a **code block** containing the function's actions:

```
def my_function():
    """Description of what the functions does"""
    python code to be executed
```


After the function is defined, we can **call** on it whenever we need it by simply executing the function like so:

`my_function()`

After the function is defined, we can call it as many times as we want without having to rewrite its code. 

In the example below, we create a function called `complimenter_function` then call it twice.

In [6]:
# Create a complimenter function
def complimenter_function():
    """prints a compliment""" # Function description statement
    print('You are looking great today!')

After you define a function, don't forget to call it to make it do the work!

In [5]:
# Give a compliment by calling the function
complimenter_function()

You are looking great today!


Ideally, a function definition statement should specify the data that the function takes and whether it returns any data. The triple quote notation can use single or double quotes, and it allows the string for the definition statement to expand over multiple lines in Python. If you would like to see a function's definition statement, you can use the `help()` function to check it out.

In [8]:
# Examining the function definition statement for our function
# Note that the parentheses are not included with complimenter_function
help(complimenter_function)

Help on function complimenter_function in module __main__:

complimenter_function()
    prints a compliment



### Parameters vs. Arguments

When we write a function definition, we can define a **parameter** to work with the function. A parameter is a variable that you define inside the parentheses of a function and acts as a **placeholder** for the value (called an argument) that you pass to the function when you call it.

```
def my_function(input_variable):
    """Takes in X and returns Y"""
    do this task
```


In the pseudo-code above, `input_variable` is a parameter because it is being used within the context of a function *definition*. When we actually call and run our function, the actual variable or value we pass to the function is called an argument.

In [10]:
# Change the complimenter function to give user-dependent compliment
def complimenter_function(user_name):
    """Takes in a name string, prints a compliment with the name"""
    print(f'You are looking great today, {user_name}!')

In [11]:
# Pass an argument to a function
complimenter_function('Sam')

You are looking great today, Sam!


In the above example, we passed a string into our function, but we could also pass a variable. Try this next. Since the `complimenter_function` has already been defined, you can call it in the next cell without defining it again.

In [2]:
# Ask the user for their name and store it in a variable called name
# Then call the complimenter_function and pass in the name variable
name = input("What's your name?")
complimenter_function(name)


What's your name? Valentina


You are looking great today, Valentina!


### Local and Global Scope

We have seen that functions make maintaining code easier by avoiding duplication. One of the most dangerous areas for duplication is variable names. As programming projects become larger, the possibility that a variable will be re-used goes up. This can cause weird errors in our programs that are hard to track down. We can alleviate this problem through the concepts of local scope and global scope.

Variables defined inside a function and variables defined outside a function exist in two different scopes, which reduces the risk of the function code and the general code interfering with each other.

* When we create a variable inside a function, we call it a **local variable**. This variable exists only within the scope of that function and is “destroyed” once the function finishes running.

* Variables created outside of any function are called **global variables**. These belong to the entire program and can be accessed from anywhere.

Because of this, global variables can be used by any function, while local variables can only be used within the specific function in which they were defined.

Look at the example below:

In [3]:
# Demonstration of global variable being used in a local scope
# The program crashes when a local variable is used in a global scope
global_string = 'global'

def print_strings():
    print('We are in the local context:')
    local_string = 'local'
    print(global_string)
    print(local_string)
    

print_strings()

We are in the local context:
global
local


The code above defines a global variable `global_string` with the value 'global' and a `local variable` local_string with the value 'local'.

When we call the `print_strings()` function, the function can access and print both the local and global variables. This is because a function always has access to variables defined within its own scope (local) as well as variables in the global scope, unless shadowed by a local variable of the same name.

However, outside the function, only the global variable persists.

In [4]:
# The function has closed, now the local string has been discarded
print('We are now in the global context: ')
print(global_string)
print(local_string)

We are now in the global context: 
global


NameError: name 'local_string' is not defined

 The local variable local_string exists only during the execution of `print_strings()` and is forgotten once the function finishes.
 
 Ideally, Python programs should limit the number of global variables and create most variables in a local scope. This keeps confounding variables localized in functions where they are used and then discarded.

## Why to Use Functions

There are several reason why using functions instead of simple code could be useful: 

* **Avoid Repetition**: instead of having to write the same code multiple times we can define it into a function only once and then call it when we need it;
* **Improve readability**: functions help to break down larger programs into smaller, manageable pieces making it easier to read, understand, and maintain;
* **Easier Debugging**: if something goes wrong with the code, we can debug only the function definition instead of the entire codebase;
* **Easier Modification**: similarly, if we wanted to update our code (ex. change the type of compliment in the complimenter( )function) we can simply update the function definition statement to make it change everywhere we used it.

## Popular Python Packages

Modules containing functions for a similar type of task are often grouped together into a package. Here are some of the most popular packages used in Python:

### Processing and cleaning data

* [NumPy](https://numpy.org/) (num-pie)- Speeds up scientific analysis of very large amounts of data in arrays
* [pandas](https://pandas.pydata.org/)- Completes data manipulation, particularly moving, cleaning, and improving data quality

### Visualizing data
* [matplotlib](https://matplotlib.org/)- Creates static, animated, and interactive visualizations. 
* [Seaborn](https://seaborn.pydata.org/)- An expansion of matplotlib that provides a "high-level interface for drawing attractive and informative statistical graphics
* [Plotly](https://plotly.com/)- Create graphs, analytics, and statistics visualizations
* [Dash](https://plotly.com/dash/)- Create interactive web applications and dashboards

### Text Analysis
* [spaCy](https://spacy.io/)- Do text analysis on a large variety of languages
* [gensim](https://radimrehurek.com/gensim/)- Do topic modeling
* [NLTK](https://www.nltk.org/) (Natural Language Tool Kit)-  Access corpora and complete text analysis tasks like tokenization, stemming, tagging, parsing, and semantic reasoning

### Artificial Intelligence and Machine Learning

* [sci-kit-learn](https://scikit-learn.org/stable/)- Implement machine learning in areas such as classification, predictive analytics, regression, and clustering
* [Keras](https://keras.io/)- Implement deep learning using neural networks
* [TensorFlow](https://www.tensorflow.org/)- Implement machine learning with a particular focus on training and deep neural networks
* [🤗 Transformers](https://huggingface.co/docs/transformers/index)- Easily work a variety of models based on Hugging Face 🤗

### Data Gathering

* [Requests](https://requests.readthedocs.io/en/latest/)- An HTTP client that helps connect to websites and download files
* [urllib3](https://urllib3.readthedocs.io/en/stable/)- Another HTTP client that helps connect to websites and download files
* [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)- Pull data out of HTML or XML files, helpful for scraping information from websites
* [Scrapy](https://scrapy.org/)- Helps extract data from websites

### Textual Digitization

* [Tesseract](https://github.com/tesseract-ocr/tesseract)- Use optical character recognition to convert images into plaintext
* [Pillow](https://pillow.readthedocs.io/en/stable/)- Read and manipulate images with Python

Packages are generally installed by using [PyPI](https://pypi.org/), the official Python package index. As of April 2022, there are over 350,000 packages available.

## How to Install a Python Package

In a code cell insert the following code:

`!pip install package_name`

for the relevant package you would like to install. The exclamation point indicates the line should be run as a terminal command. 

Refer to the package's documentation for guidance.