# Project Report 
# FCP Assignment: Graphical Analysis and Simulation
***

## Overview


FCP Assignment: Graphical Analysis and Simulation

This project is a joint contribution of the efforts of Tom Mulholland, Dheeraj Polanki, Joe Taylor and Callan O'Brien.

The projects foci is on simulations and data relating to the coronavirus epidemic. More specifically, the graphical analysis of Covid data in each of Bristol's MSOAs (Middle Super Output Areas), and a simulation that simulates Covid in a hypothetical setting.

The project contains:

A main script that runs all the other scripts,

A GUI to select which parts of the project you want to access,

Argparsing to assist in the formatting of command line interfaces,

A choropleth map which plots Covid cases in Bristol,

A simulation that simulates Covid,

A bar chart race animation displaying Covid Cases in Bristol.

## main
The main file contained code that allowed the program to be ran and controlled from a terminal.

This was done so using the argparse module. When creating an argument the most important thing to think about is the 'action' it takes when called. Each mode of the program had the action of 'store_true' which stores the argument as True if called and False if not. When combined with if statements this allows the correct function to be ran to produce the map or graph the user requested. For the simulation the user had the option to specify the value of variables that govern how it is ran, taking the action of 'dest' saves the given values as an argument that can be called upon in a function that runs the simulation with these values. These arguments also have been set 'default' values that are assigned to them if not specified in the command line.

In [None]:
import argparse
from Simulator import Simulator
import numpy as np
import mapping
import User_Interface
import bar_chart_race

parser = argparse.ArgumentParser(prog= 'main',
                                 description= ("""
           ,'/ \`.
          |\/___\/|
          \'\   /`/
           `.\ /,'
              |
              |
             |=|
        /\  ,|=|.  /\
            
     '`.  \/ |=| \/  ,'`.
  ,'    `.|\ `-' /|,'    `.
,'   .-._ \ `---' / _,-.   `.
   ,'    `-`-._,-'-'    `.
  '                       `'
  Controllable -sim Variables (All 0-100)"""),epilog='Happy Infecting!',formatter_class=argparse.RawTextHelpFormatter)

parser.add_argument('-UI', action='store_true', help='Opens GUI')
parser.add_argument('-map', action='store_true', help='Creates choropleth map .gif ')
parser.add_argument('-bar', action='store_true', help='Creates bar chart race .mp4')
parser.add_argument('-sim', action='store_true', help='Runs simulation. You can either use defualt values or select them individually using, -contagiousness, -recovery, -deadliness')

parser.add_argument('-con','--contagiousness', type=int,action='store',default=90,
                    dest='infect_percentage',
                    help='(default 90, [0-100])The chance of contracting the virus upon contact',)
parser.add_argument('-rec','--recovery', type=int,action='store',default=70,
                    dest='recover_percentage',
                    help='(default 70, [0-100])The chance a person recovers from the virus, lower values increase the time taken to fight off the virus')
parser.add_argument('-dead','--deadliness', type=int,action='store',default=1,
                    dest='death_percentage',
                    help= '(default 1, [0-100])The chance a person dies as a result of the virus, RIP :(')
args = parser.parse_args()

virus_param_dictionary=vars(args)
virus_param_list = list(virus_param_dictionary.values())
for i in virus_param_list:
    if i<0 or i>100:
        raise argparse.ArgumentTypeError('Make sure the your virus has parameter values in the range 0-100')

The simulation variables must also be checked for validity and converted into a probability for them to work in the simulation function. If the variables are entered incorrectly or an out of range error is presented to the user:

In [None]:
virus_param_dictionary=vars(args)
virus_param_list = list(virus_param_dictionary.values())
for i in virus_param_list:
    if i<0 or i>100:
        raise argparse.ArgumentTypeError('Make sure the your virus has parameter values in the range 0-100')

def percentage_to_probability(x):
    z = (x/100)
    return z

numberOFpeople = 100
radius = np.array([0.03 for i in range(numberOFpeople)])

#The is passed when creating the players to make all the players healthy/grey
c0l0r = {"edgecolor": "black", "linewidth": 1, "fill": True, "facecolor":"grey"}
def sim_in_arg():
    infect_probability = percentage_to_probability(args.infect_percentage)
    recover_probability = percentage_to_probability(args.recover_percentage)
    death_probability = percentage_to_probability(args.death_percentage)
    Sim = Simulator.Simulation(numberOFpeople, infect_probability, recover_probability, death_probability, radius, c0l0r)
    Sim.do_animation()

We decided that only one mode of the program should be able at run at a time making each mode mutually exclusive to one another. Therefore, before being called, the commands must pass through another check to see if multiple modes were selected at the same time:

In [None]:
while True:
    if (args.sim and (args.map or args.bar or args.UI)) or (args.map and (args.sim or args.bar or args.UI)) or (args.bar and (args.map or args.sim or args.UI) or (args.UI and (args.sim or args.map or args.bar))):
        raise argparse.ArgumentTypeError('Please selct only one of: -UI -sim -map -bar ')
        break
    if args.UI == True:
        User_Interface.runUI()
        break
    if args.sim == True:
        sim_in_arg()
        print(vars(args)) #shows values of arguments in command line
        break
    if args.map == True:
        mapping.create_choropleth()
        break
    if args.bar == True:
        bar_chart_race.genBarChart()
        break
    else:
        raise argparse.ArgumentTypeError('please enter a valid mode')
        break


## Data_processing

The project is set on simulations and data relating to COVID. Specifically, the project is about COVID cases in Bristol, seperated via MSOAs (Middle Super Output Area(s)). The data was acquired from the following url:

* Number of new covid cases per week for each Middle Super Output Area (MSOA) in Bristol: 
(https://coronavirus.data.gov.uk/details/download)

The data file, covid_data_bristol.csv, has over 4000+ elements listed inside. A lot of these are not necessary for the project, so the data must be formatted and processed into a database that can be easily manipulated:

In [None]:
import pandas as pd
import numpy as np


def create_database():
    #Opens the excel csv and assigns it a database to be manipulated
    covDataRaw = pd.read_csv('data/covid_data_bristol.csv')
    dF = pd.DataFrame(covDataRaw)
    #print(dF)
    
    #Just taking out all the columns of info we do not need
    del dF['regionCode']
    dF.pop('regionName')
    dF.pop('UtlaCode')
    dF.pop('UtlaName')
    dF.pop('LtlaCode')
    dF.pop('LtlaName')
    dF.pop('areaType')
    
    return dF

Through the use of Pandas, a function is created that converts covid_data_bristol.csv into a clean, formatted database for use in other sections of the project. A lot of the 'popped' elements are static repeated elements that have no function.

## Choropleth map

I was inspired to create choropleth map (for covid infection rates in bristol) by the sheer ammount that I saw in the news over the course of the Covid-19 pandemic as well as the live map on the government website:

(https://coronavirus.data.gov.uk/details/interactive-map/cases).

To make this plot I needed two sets of data to ceate the map:

* Number of new covid cases per week for each Middle Super Output Area (MSOA) in Bristol 
(https://coronavirus.data.gov.uk/details/download)
* Shapefile that conatins the geometry of Bristol split into its MSOAs 
(https://opendata.bristol.gov.uk/explore/dataset/msoa11/information/?location=11,51.47093,-2.61604&basemap=jawg.streets)

The final output is show in the gif below:

In [19]:
from mapping import create_choropleth
from IPython.display import HTML
create_choropleth()
HTML('<img src="bristol_covid_map.gif">')


Creating choropleth map
Saved as bristol_covid_map.gif


<Figure size 432x288 with 0 Axes>

Viewing this data visually on the map is a really good way of seeing an overview of the pandemic. Each wave when a new variant was discoverd shows very easily when all the zones have a sudden jump in current cases, especially with the highly transmissible Omicron variant in early 2022.

It is also very interesting to see the distrubution of these cases with these spikes. The highest rates of new cases are normally in the central regions of the city but also in areas like Redland, Cotham and Clifton where there is a very dense student population. This student population means you can also see spikes in October of 2020 and 2021 when students returned to the city for the new university year. This would mainly be due to the increase of population but also students showing less symptons and therefore bringing cases to the city without realising.

To create this map two functions are used. The first geodf_create() to create a geodataframe and manipulate it until it's in the correct form to be ingested by geopandas. The second, plot_animation(), will use FuncAnimation to create an animation from the data in geodataframe.

geodf_create() first creates a geodataframe from the shapefile, which will store the geometry required for each MSOA. It then strips any unnecessary data and orders the MSOAs by their area code:

In [None]:
import geopandas as gpd
msoa_bristol = gpd.read_file(r'data/bristol_msoa11_map_data/msoa11.shp')

msoa_bristol.pop('lacd')
msoa_bristol.pop('area_m2')
msoa_bristol.pop('perimeter_m')
msoa_bristol.pop('mi_prinx')
msoa_bristol.pop('objectid')
msoa_bristol.pop('msoa11nm')

msoa11cd_end = []
for msoa11cd in msoa_bristol['msoa11cd'].values:
    msoa11cd_end.append(msoa11cd[-4:])

msoa_bristol['msoa11cd_end'] = msoa11cd_end
msoa_cases_ordered = msoa_bristol.sort_values(by=['msoa11cd_end'])
msoa_cases_ordered.pop('msoa11cd_end')
msoa_cases_ordered = msoa_cases_ordered.rename(columns = {'msoa11cd':'areaCodes'})

print(msoa_cases_ordered.head(5))
print('.....')
print('.....')
print(msoa_cases_ordered.tail(5))

It then calls create_database() from data_processing.py to import a cleaned dataframe of the covid data for Bristol. All uneeded data is then removed:

In [None]:

import data_processing as dp
covid_data = dp.create_database()

covid_data.pop('areaName')
covid_data.pop('newCasesBySpecimenDateChange')
covid_data.pop('newCasesBySpecimenDateChangePercentage')
covid_data.pop('newCasesBySpecimenDateDirection')
covid_data.pop('newCasesBySpecimenDateRollingRate')

print(covid_data)

Two lists are created, one with every date a reading was taken and another with each area code:

In [None]:
unique_dates = []
for date in covid_data['date']:
    if date not in unique_dates:
        unique_dates.append(date)
unique_dates.reverse()

unique_area_codes = []
for area_code in covid_data['areaCode']:
    if area_code not in unique_area_codes:
        unique_area_codes.append(area_code) 

print(f'unique dates = {unique_dates}\n')
print(f'unique area codes = {unique_area_codes}')

Geopandas requires your data for each date to be in its own columns but the .csv downloaded contains data for each date in multiple rows. This means we have to sort through the data.
The loop below iterates through the list of dates and then finds all data for the selected date. The area code linked to each data point is found and using this and the data point; Its put in the correct place in a clean dataframe. After each iteration of the date, the data frame is copied to the same variable name to remove de-fragmentation.

Once the loop is finished all empty spaces in the dataframe are given a zero value and its then merged with the geometry data to give us the final geodataframe:

In [None]:
import pandas as pd

unique_data = {'areaCodes': unique_area_codes}

cases_columns = pd.DataFrame(unique_data)

cases_columns_index_list= cases_columns.index.tolist()

areacode_index_dict = dict(zip(unique_area_codes, cases_columns_index_list))

for date in unique_dates:
    index_count = 0
    for covid_data_date in covid_data['date'].values:
        if date == covid_data_date:

            current_area_code = covid_data.at[index_count, 'areaCode']
            cases_columns_index = areacode_index_dict[current_area_code]
            cases_columns.at[cases_columns_index, date] = covid_data.at[index_count, 'newCasesBySpecimenDateRollingSum']
        cases_columns = cases_columns.copy()
        index_count += 1

cases_columns = cases_columns.fillna(0)

msoa_cases_ordered = msoa_cases_ordered.merge(cases_columns)

print(msoa_cases_ordered.head(5))

plot_animation() then takes this geodataframe and the list of dates and iterates through each data to create each frame for the animation. It will then save this animation as a gif:

In [None]:
vmin, vmax = 0, 350
color_map = 'Reds'
divider = make_axes_locatable(ax)
cax = divider.append_axes('right', size='5%', pad=0.1)

def animate(i):
    ax.clear()
    geo_df.plot(column = dates[i], cmap = color_map, ax=ax, figsize=(10,10), linewidth=0.6, edgecolor = 'black', vmin=vmin, vmax=vmax, legend=True, cax=cax)
    ax.set_title(f'Number of new cases in week beginning {dates[i]}', fontdict={'fontsize': '15', 'fontweight' : '3'})
    ax.set_axis_off()


ani = FuncAnimation(fig, animate, interval=200, repeat=False, blit=False)

plt.close()
ani.save('bristol_covid_map.gif', dpi=300)

Both the functions broken down above are then added to the create_choropleth() function along with some print statements to show the status of the code to the user. This allows the whole process to be called in any other file with only one function:

In [None]:
def create_choropleth():
    print('Creating choropleth map')
    geodf, dates = geodf_create()
    ani = plot_animation(geodf, dates)
    print('Saved as bristol_covid_map.gif')

## Covid simulation

The Covid Simulation generates a playground (box) of 100 people (circles) and infects 1 random person in the playground. The simulation needs 3 inputs infection_percentage, recover_percentage and death_percentage.

This type of simulation also shows how herd immunity works, a person cannot be infected because the people around them are immune. 

The Simulation class generates the playground and plots all 100 people using the Person class in random locations inside the box and assigns them random velocities with a "healthy" status and a radius of r:


In [None]:
Simulation(numberOFpeople, infect_probability, recover_probability, death_probability, radius, c0l0r)

Where:

numberOFpeople = 100,

infect_probability, recover_probability and death_probability can be any number,

radius = np.array([0.03 for i in range(numberOFpeople)]),

c0l0r = {"edgecolor": "black", "linewidth": 1, "fill": True, "facecolor":"grey"}.

These values can be changed, however, the most optimal setting is:

numberOFpeople=100,

radius=np.array([0.03 for i in range(numberOFpeople)]).

When the simulation class is called the numberOFpeople, radius and c0l0r are given as inputs to the init_particles function

In [None]:
def init_particles(self, population, radius, c0l0r):
    """
    Generates the people for the simulation.
    """

    self.population = population
    self.people = []
    for i, radius1 in enumerate(radius):
        key = i
        # Try to find a random initial position for this person.
        while not self.place_Person(radius1, c0l0r, key):
            #print("person added")
            pass
    self.start_infection()

This function calls the function place_Person(radius, c0l0ur, key) which generates random x & y coordinates and xv & yv velocities for the person and plots them on the graph using the Person Class. If a generated position is on top of an already existing player a new postion is generated.

Warning: The xv & yv velocities should not be tampered with. If the velocities are too high the people will go out of the playground(box) in between each frame or overlap and start glitching.

In [None]:
def place_Person(self, radius1, c0l0r, key):
    """
    Positions and velocities are chosen randomly, if a generated position overlaps
    with another person on the gird a new position is generated.
    """

    # Generates random x, y so that the person is inside the grid
    x = radius1 + (1 - 2*radius1) * np.random.random()
    y = radius1 + (1 - 2*radius1) * np.random.random()
    # Generates a random velocity for the Person (low enough so the players dont merge into each other or move out of the gird between frames).
    constant = 0.05 + (np.sqrt(np.random.random())*0.1)
    #constant = 0.1 * np.sqrt(np.random.random()) + 0.1  for faster movement
    random_hypo = (np.random.random()*6)
    # Using the Hypotenuse generates x and y velocities
    vx = constant * np.cos(random_hypo)
    vy = constant * np.sin(random_hypo)
    stat = "healthy"
    player = self.PersonClass(key, x, y, vx, vy, stat, radius1, c0l0r)
    # Check that the person doesn't overlap with one that"s already been placed.
    #print(f"Location:({x},{y})")
    #print(f"Velocity{player.vx}, {player.vy}")
    for person in self.people:
        if person.on_top(player):
            break
    else:
        self.people.append(player)
        return True
    return False

After all the 100 people are ploted inside the box a random player is infected using the start_infection() function. This function also clears the list used for plotting a line graph

In [None]:
def start_infection(self):

    self.healthy_list.clear()
    self.infected_list.clear()
    self.immune_list.clear()
    self.dead_list.clear()

    for i in self.people:
        self.healthy_list.append(i)
        # i.status = "healthy"
    # Infects a random person

    c = np.random.randint(1,self.population)
    self.people[c].infected()
    self.people[c].status = "infected"
    self.healthy_list.remove(self.people[c])
    self.infected_list.append(self.people[c])

setup_animation() generates:

1) 2 subplots, 1 for the playground(box) and another for the line graph,

2) Sets the axis limits for each subplot, 

3) Creates variables for the line graph,

4) Creates a new window using the Tkinter module which displays the 2 subplots next to each other.

The FuncAnimation is ran after the setup_animation()

In [None]:
self.anime = FuncAnimation(self.fig, self.animate, save_count = 1000,
                              interval=1, blit=True, init_func=self.init, repeat=False)

self.fig contains  the 2 subplots

In every frame self.animate calls advance_animation() and updates the line graph and the playground.
advance_animation() moves all the people and if 2 people overlap, their velocities will be changed accordingly, if one of the 2 people are infected the chance of the other healthy person getting infected is infect_probability.
Every 10th frame there is a chance for some people to either get 'immune' or 'die', this depends on the recover_probability and death_probability.

1 Day = 10 Frames
So everyday day theres a chance of someone dying or recovering.

After 100 days of simulation is an image of the playground and the line graph is saved as an image

## Bar_chart_race

Using the data from covid_bristol_data.csv (the formatted version acquired through calling the function in data_processing), a Bar Chart Race (bcr) plot is created.

A bar chart race was considered the most appropriate plot to make, even statically (as a compromise). The inspiration for the bar chart race came from needing to animate each date and sequence values as it's almost impossible to plot regularly (it looks far too chaotic).

Secondly, the data plotted shows the new cases (values defined as 'rolling rate' context:covid), in each of Bristols MSOAs that were recorded on that respective date.


To begin the modules must be imported:

In [None]:
# Import Pandas for data base manipulation
# Import dataprocessing - the processed original data script/database.
# Import matplotlib to make plot and animated data
import pandas as pd
import data_processing as dp
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation

With the modules imported, a database must be created and formatted to be ready for plotting:

In [None]:
# Creating the database and popping all unused values, to clean the database
# and ready it for formatting.
infectData = dp.create_database()
infectData.pop("areaCode")
infectData.pop("newCasesBySpecimenDateChange")
infectData.pop("newCasesBySpecimenDateChangePercentage")
infectData.pop("newCasesBySpecimenDateDirection")
infectData.pop("newCasesBySpecimenDateRollingSum")
infectData.sort_index()

# Uses the to_datetime function to create an ordered date sequence for the
# values to plot appropriately
infectData["date"] = pd.to_datetime(infectData["date"])
infectData.sort_values(by="date")

# groupData is a formatted database, reindexing the values from infectData
# and turning it into a wide-date table so it can be turned into a bcr.
groupData = infectData.pivot_table(values="newCasesBySpecimenDateRollingRate"
                                   , index="date", columns="areaName")


# Cleaning corrupted data
groupData = groupData.drop("Redland & St Andrew's", axis=1)

# Fills all NaN values with 0 and sorts the index of groupData.
groupData.fillna(0, inplace=True)
df = groupData.sort_index()

# Locates the positions based on length in groupData and counts the
# cumulative sum and updates the database respectively.
df.iloc[:, 0:-1] = groupData.iloc[:, 0:-1].cumsum()

A database 'infectData' is created by calling the function created in 'data_processing', it is then stripped of irrelevant data and sorted.
The 'date' column is then formatted as an index for the data to follow using the 'to_datetime' function, and then sorted by date.
'groupData' is the wide-formatted database that takes the properties of 'infectData' and reindexes/pivots them through the '.pivot_table' function.
Lastly, the database is cleaned up with any missing values replaced with zeros, and updated with the cumulative sum of each element via the '.iloc' function and corresponding position in a new database 'df' for easier naming.

Now that there is a clean, formatted database, it can be plotted using matplot lib and eventually animated.

In [None]:
# This function sorts and sets the axes
def setAxes(ax):
    # ax.set_facecolor('.8')
    ax.tick_params(labelsize=10, length=0)
    # Disables Gridline
    ax.grid(True, axis='x', color='white')
    # Set axis below
    ax.set_axisbelow(True)
    [spine.set_visible(True) for spine in ax.spines.values()]
   


# Getting data ready to plot
def ready_data(df):
    df = df.reset_index()

    last_idx = df.index[-1] + 1
    ready_width = df.reindex(range(last_idx))
    ready_width['date'] = ready_width['date'].fillna(method='ffill')
    ready_width = ready_width.set_index('date')
    # Setting a colour for each palce
    ready_y = ready_width.rank(axis=1, method='first')
    return ready_width, ready_y

ready_width, ready_y = ready_data(df)


labels = ready_width.columns
colors = plt.cm.Dark2(range(6))


A function called 'setAxes' passes a variable ax which contains the properties for the axis. The function creates a label using '.tick_params' by changing the appearance of ticks. 'ax.grid' and its paramaters disable the gridlines and lastly, an axis is setup with all of the axis spines (border of plot area).

Similarly, there is a function for preparing the data for plotting. The function 'ready_data' has the formatted database (df = groupData) as an argument. It resets the index of the database, organises the index and creates a 'ready_width' variable which takes the 'date' and formats it as the x axis the data plots against. 'ready_y' returns a rank of every index of a passing date, the parameter 'method='first'' is how it sorts those values. Lastly they are assigned as label(s) and assigned a colour map.

As there is too much data to handle for a single static graph, the plot needed to be animated, so matplotlib was utilised further to format the bar-plots to work in animation.

In [None]:
def update(i):
    ax.clear()
    for line in ax.containers:
        line.remove()
    y = ready_y.iloc[i]
    width = ready_width.iloc[i]
    # Plotting bar graph
    ax.barh(y=y, width=width, color=colors, tick_label=labels)
    #Plotting the title and date
    date_str = ready_width.index[i].strftime('%d-%m-%Y')
    ax.set_title(f'Covid Cases in Bristol by Date (Rolling Rate) - {date_str}', fontsize='smaller')
    ax.set_ylim(44.5,54.5)
    return ax

# Setting the figure size
fig = plt.figure(figsize=(17, 6))
ax = fig.add_subplot(1, 1, 1)
print('Your gif is now being created, please wait...')

For the BCR, a function called 'update' is created. 
The function takes argument 'i', cleans the axis with '.clear()' and removing the parameters 'line' from 'ax.containers'. It assigns the y-axis as 'ready_y' using location 'i' - same applies to x axis 'width'. The bar chart is then plotted using the '.barh' function called from matplotlib with its respective parameters. The dates are formatted into Date/Month/Year and a title and bar limit is set (Top ten bars) so not all the bars are present at once.

The figure size is set and suplots are added.

All that's left is to animate the bar chart:

In [None]:
def genBarChart():
    anim = FuncAnimation(fig=fig, func=update, frames=len(ready_width), 
                     interval=200, repeat=False)


    anim.save("bar_chart.gif")
    print('Saved as bar_chart.gif')

The last process involves animating the bar chart.
A function 'genBarChart' is created that stores the variable 'anim' which is effectively responsible for the animation. By using matplotlib's FuncAnimation and its parameters, a bar chart race plot is created.

The figure in FuncAnimation is set to the established figure, the function that plots the bar charts per interval and enough frames to accomodate the width of the index.

Finally the function saves the bar chart race plot animation as a gif.

Below is the final output/bar_chart_race animation.gif:

In [6]:
from bar_chart_race import genBarChart
from IPython.display import HTML
genBarChart()
HTML('<img src="bar_chart.gif">')

Saved as bar_chart.gif


## User interface

Using Tkinter makes it easier for the user to navigiate and find what the user wants and the user can input values for the simulation like the infection_percentage, recover_percentage and death_percentage. The user can view simulation, bar chart and the choropleth map in the Tkinter window.

In the User_Interface.py each page is in its own class.
There are 4 Pages in total.

1) Main Page - Has buttons for the user to choose between Sim Page, Map Page, and Bar Page.

2) Sim Page  - This page has entry boxes for the user to input infection_percentage, recover_percentage & death_percentage. 
               The sumulation can be run by clicking on the Start Simulation button.
               
3) Map Page  - This page displays the Chloroplet Map frame by frame and the user can choose which frame to view by moving the                  slider below.

4) Bar Page  - This page displays the Bar Chart frame by frame and the user can choose which frame to view by moving the                        slider below.

Each page has the "Return to home button" which takes the user back to the Main Page where the user can again select a different page.

The Application can be started by importing User_Interface and calling the function runUI().

In [None]:
import User_Interface as UI

UI.runUI()