# Python in Use

Now we have the basics of Python down lets put it to use with a quick real world example.

In this example we will use Python to scrape the disclosed meetings of [EU Commission president Ursula von der Leyen](http://ec.europa.eu/transparencyinitiative/meetings/meeting.do?host=c8e208ad-7dc2-4a97-acc9-859463c69ec4&d-6679426-p=1). We can then analyse and plot this data. 

## Importing modules

In the last tutorial we briefly touched on modules and importing them. In this tutorial we will be using some very popular and powerful modules. Lets start by importing `pandas` and `time`. 

In [2]:
import time
import pandas as pd

Time is a Python package that lets you set timers in your code. We will be using it to make sure our scraper doesn't query the EU site too fast.

Pandas is a great module for working with data. It gives you the ability to create DataFrames which you can then easily analyse.

It also comes with the great feature `read_html` which will read data from an html table and out put a dataframe. This makes scrapping the EU meetings site as easy as running one line of code.

To start lets define a variable `url_start` that we can scrape.

In [3]:
url_start = 'http://ec.europa.eu/transparencyinitiative/meetings/meeting.do?host=c8e208ad-7dc2-4a97-acc9-859463c69ec4&d-6679426-p=1'

Now we call the `read_html` function from the pandas module.

In [4]:
df = pd.read_html(url_start)[0] # For some reason read_html returns the dataframe as the only item in a list so we use [0] to get the dataframe out of the list 


In [5]:
df

Unnamed: 0,Date,Location,Entity/ies met,Subject(s)
0,03/09/2021,"Evian, France",Allianz SE (Allianz Group),Meeting with CEO of Allianz
1,29/08/2021,Brussels,European Round Table for Industry (ERT),Dinner/ meeting with the ERT members on green ...
2,25/08/2021,Videoconference,Siemens AG (SAG) Volvo AB (Volvo Group) A.P....,"Videoconference with Chairman of Volvo, Chairm..."
3,19/07/2021,Brussels,Bill & Melinda Gates Foundation (BMGF),Meeting with Co-chairman and co-founder of the...
4,09/07/2021,Brussels,Global Citizen,Meeting with the CEO from Global Citizen
5,24/04/2021,Brussels,Potsdam-Institut für Klimafolgenforschung (PIK),Meeting with Founding Director of the Potsdam ...
6,29/03/2021,Videoconference,Siemens AG (SAG) Volvo AB (Volvo Group) A.P....,"Videoconference with Chairman of Volvo, Chairm..."
7,19/03/2021,Videoconference,Bundesverband der Deutschen Industrie e.V. (BDI),Meeting with BDI President
8,20/02/2021,Videoconference,Global Citizen,Meeting with the CEO from Global Citizen
9,19/02/2021,Videoconference,Siemens AG (SAG) Volvo AB (Volvo Group) A.P....,"Meeting with Chairman of Volvo, Chairman of Si..."


And ta da! We have the first sheet of meetings. But to analyse all of the president meetings we need to pull the data from all the pages and combine them. First lets turn our little scraping code into a function. This might seem like a waste of time for one line of code, but this is a remarkably small amount of code for a scraper so good practise for building more complex scrapers.

In [10]:
def EU_meeting_scraper(url):

    df = pd.read_html(url)[0]

    return df




Notice how the url has `-p=1` at the end of it. A guess and a quick experiment in with the browser confirms the number `p` equals corresponds to the page of meetings. So to get all the pages we need to iterate through the pages using the urls.

Now we can get all the meetings on the site using a loop. Be sure to read the muted text around the code that explains what each bit of code is doing.

In [11]:
page_number = 1 # Start at the first page 

df_list = [] # We create an empty list to add the meeting dataframes to as we go along
while page_number <= 5: # There are only 5 pages of meetings on the page
    print(page_number)

    # Remember what we learnt earlier about how to turn numbers into strings so we can add them to text
    url = 'http://ec.europa.eu/transparencyinitiative/meetings/meeting.do?host=c8e208ad-7dc2-4a97-acc9-859463c69ec4&d-6679426-p=' + str(page_number)

    # Now we pull the data from url and save as the temporary database called df_temp
    df_temp = EU_meeting_scraper(url)

    # Now we add this database to our list of databases
    df_list.append(df_temp)

    time.sleep(2) # Now we use the sleep function to tell python to wait for two seconds so we don't annoy the EU's servers too much

    # Finally we add one to our page number so that we arn't pulling information from the same page over and over again. The EU really wouldn't like that.
    page_number = page_number + 1


1
2
3
4
5


Now lets combine all the pages of data, which we have saved in `df_list`, into one dataset using the pandas `concat` function. We also add `reset_index` to reset the index so it counts all the rows in the dataframe, but don't worry too much about that.

In [12]:
df = pd.concat(df_list).reset_index(drop=True)

And we now have all the meetings!

In [13]:
df

Unnamed: 0,Date,Location,Entity/ies met,Subject(s)
0,03/09/2021,"Evian, France",Allianz SE (Allianz Group),Meeting with CEO of Allianz
1,29/08/2021,Brussels,European Round Table for Industry (ERT),Dinner/ meeting with the ERT members on green ...
2,25/08/2021,Videoconference,Siemens AG (SAG) Volvo AB (Volvo Group) A.P....,"Videoconference with Chairman of Volvo, Chairm..."
3,19/07/2021,Brussels,Bill & Melinda Gates Foundation (BMGF),Meeting with Co-chairman and co-founder of the...
4,09/07/2021,Brussels,Global Citizen,Meeting with the CEO from Global Citizen
5,24/04/2021,Brussels,Potsdam-Institut für Klimafolgenforschung (PIK),Meeting with Founding Director of the Potsdam ...
6,29/03/2021,Videoconference,Siemens AG (SAG) Volvo AB (Volvo Group) A.P....,"Videoconference with Chairman of Volvo, Chairm..."
7,19/03/2021,Videoconference,Bundesverband der Deutschen Industrie e.V. (BDI),Meeting with BDI President
8,20/02/2021,Videoconference,Global Citizen,Meeting with the CEO from Global Citizen
9,19/02/2021,Videoconference,Siemens AG (SAG) Volvo AB (Volvo Group) A.P....,"Meeting with Chairman of Volvo, Chairman of Si..."


## Next steps

That concludes the InfluenceMap coding tutorials. If you are eager to learn more here are some good resources:

* Code Hub team - If you want to continue learning code feel free to reach out to the Code Hub team. We have weekly meetings to discuss code and are trying to support the development of coders at InfluenceMap. 

* [First Python Notebook course](https://www.firstpythonnotebook.org/) - This is a free tutorial developed for journalists and makes an excellent next step following the completion of our tutorial.

* The [InfluenceMap Data Visualization guide](https://git.influencemap.org/JakeCarbone/DV-guide) - This is a guide I made to making data visualizations covering both the theory behind good design and a guide to using Plotly in Python.

* [InfluenceMap gitea guide](https://git.influencemap.org/InfluenceMap/Gitea-setup-guide) - InfluenceMap uses a open source code storage system called Gitea, which uses the super popular git framework for saving and sharing code. If you have ever heard of the site Github, that is the most common place for people to store their git repositories. This guide walks you through the setup and a bit of the use on how to use git and our version of Github - Gitea.