# Final Assessment

Your university is collaborating with a company to develop a tool for processing reports of cases of a particular disease. The company has a very specific way of doing things, and has asked you to design and build a JavaScript object to interact with their proprietary code. Your supervisor wants you to submit an example of your code working within a Jupyter notebook (her preferred environment!) so she can review it before sending it off to the company.

Remember - your code will be tested against other input data, so make sure that you are being careful in your assumptions!

### Task 1

You have been provided with a sample of the case incidence records to help you test your code in the file "sampleCases.tsv". The company has done a questionable job of digitising these records, however, and you will need to clean them up a bit before you process them. Remember, .tsv files are "tab separated values", meaning that the entries are separated by tabs rather than commas (see [here](https://docs.python.org/3/reference/lexical_analysis.html#literals) for more information).

Read in the file and store each record as an element within a list.

In [1]:
import csv

list_of_records=[]
with open("sampleCases.tsv") as tsvfile:
    reader = csv.reader(tsvfile,delimiter='\t')

    for row in reader:
        list_of_records.append(row[1:])
#         print(row[1:]) 
del list_of_records[0]
print(list_of_records)


[['8.167', '-10.650', '28/03/2014', 'suspected'], ['8.127', '-10.712', '26/05/2014', 'confirmed, presented on 24/05/2014'], ['8.198', '-10. 688', '27/05/2014', 'suspected, contact with case?2'], ['8.227a', '-10.677', '27/05/2014', 'suspected, contact with case 2'], ['8.240', '-10.639', '27/05/2014', 'suspected'], ['8.208', '-10.558', '28/05/2014', 'suspected,,,'], ['8.221', '-10.714', '28/05/2014', 'suspected, contact with case 2'], ['8!251', '-10.691', '28/05/2014', 'confirmed,,contact with case_2'], ['8.191', '-10.659', '28/05/2014', 'confirmed,_contact with case 8'], ['8.192', '_-10.741', '28/05/2014', 'suspected']]


### Task 2

If you look at the records, you find that the latitudes and longitudes of cases are sometimes formatted strangely - apparently the automatic text recognitition software used by the company has a few bugs in it! Go through the records and prompt the user to input the corrected values in cases where the elements cannot be processed as numbers.

In [2]:
import pandas as pd
import re

df = pd.read_csv("sampleCases.tsv",sep="\t")

%store df #store for later use

#replace wrong values for latitude
for item in df['lat']:
    if re.search("[^0-9.-]",item): #if item has anything other than a number,negative sign or decimal point
        item_replacement = input("Enter a correct value to replace latitude {} : ".format(item))
        df.loc[df["lat"]==item,['lat']]=item_replacement

#replace wrong values for longitude
for item in df['lon']:
    if re.search("[^0-9.-]",item): #if item has anything other than a number,negative sign or decimal point
        item_replacement = input("Enter a correct value to replace longutude {} : ".format(item))
        df.loc[df["lon"]==item,['lon']]=item_replacement
print("\n")
print(df)
        

Stored 'df' (DataFrame)
Enter a correct value to replace latitude 8.227a : 8.227
Enter a correct value to replace latitude 8!251 : 8.251
Enter a correct value to replace longutude -10. 688 : -10.688
Enter a correct value to replace longutude _-10.741 : -10.741


   id    lat      lon        date                               notes
0   1  8.167  -10.650  28/03/2014                           suspected
1   2  8.127  -10.712  26/05/2014  confirmed, presented on 24/05/2014
2   3  8.198  -10.688  27/05/2014      suspected, contact with case?2
3   4  8.227  -10.677  27/05/2014      suspected, contact with case 2
4   5  8.240  -10.639  27/05/2014                           suspected
5   6  8.208  -10.558  28/05/2014                        suspected,,,
6   7  8.221  -10.714  28/05/2014      suspected, contact with case 2
7   8  8.251  -10.691  28/05/2014      confirmed,,contact with case_2
8   9  8.191  -10.659  28/05/2014      confirmed,_contact with case 8
9  10  8.192  -10.741  28/05/2014    

### Task 3

It's time to format the data so that you can transfer it to the JavaScript portion of your work. You'll need to copy over the latitudes and longitudes. You may format these as a set of dictionaries, a pair of lists, or some other structure - it is up to you. 

In [33]:
location_dictionary=dict(zip(df['lon'],df['lat']))
print(location_dictionary) #paired as longitude:latitude

{'-10.650': '8.167', '-10.712': '8.127', '-10.688': '8.198', '-10.677': '8.227', '-10.639': '8.240', '-10.558': '8.208', '-10.714': '8.221', '-10.691': '8.251', '-10.659': '8.191', '-10.741': '8.192'}


### Task 4

The company wants you to create a kind of object called a `CaseStudy` to help them deal with all of the records associated with this file. In the future, they will use other `CaseStudy` objects to compare this set of data with other sets of data. But in this case, you just want to test creating a single `CaseStudy` and using its functions. 

The `CaseStudy` object should be able to:

* return its geographic extent (the minimum and maximum of both latitude and longitude) as a list (e.g. `[min_lat, max_lat, min_lon, max_lon]`)
* return the weighted centroid of the points as a list (e.g. `[cent_lat, cent_lon]`)

The `CaseStudy` object should also have attributes to hold:

* the name of the case study (a string)
* the year in which the case study was conducted (an integer)
* a value indicating whether the company took part in gathering the data (a boolean)

Your supervisor would like for you to create the instance of `CaseStudy` based on the data given here, calculate both the geographic extent and the weighted centroid, and export those two values back to your Python environment. You may set the name, year, and participation values of the `CaseStudy` to whatever you like.

In [None]:
%%javascript

// you can use this to test/practice your JavaScript!

In [None]:
from IPython.display import display, Javascript

display(Javascript("""var someval = 3, someotherval = 5;
IPython.notebook.kernel.execute('SOMEVAL=' + someval + ';');
IPython.notebook.kernel.execute('SOMEOTHERVAL=' + someotherval + ';');
""" % ""))

### Task 5

As a final step, write out the geographic extent, centroid, and the original filename of the data to a file named "output.txt".