# [Task2 Prepare Submission]
The title of the notebook should be coherent with file name. Namely, file name should be:    
*author's initials_progressive number_title.ipynb*    
For example:    
*EF_01_Data Exploration.ipynb*

## Purpose
State the purpose of the notebook.

## Methodology
Quickly describe assumptions and processing steps.

## WIP - improvements
Use this section only if the notebook is not final.

Notable TODOs:
- todo 1;
- todo 2;
- todo 3.

## Results
Describe and comment the most important results.

## Suggested next steps
State suggested next steps, based on results obtained in this notebook.

# Setup

## Library import
We import all the required Python libraries

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
import os,sys
import joblib
from os import path

# Data manipulation
import pandas as pd
import numpy as np

# Options for pandas
pd.options.display.max_columns = 50
pd.options.display.max_rows = 30

# Common things
# from sklearn.metrics import classification_report
# from scipy import stats

# Visualizations
# import matplotlib as plt
import matplotlib.pyplot as plt
import seaborn as sns
# sns.set_style("white") # darkgrid, whitegrid, dark, white, and ticks
# plt.figure(figsize=(7, 7))

# Autoreload extension
# if 'autoreload' not in get_ipython().extension_manager.loaded:
#     %load_ext autoreload
    
# %autoreload 2

In [3]:
# Examples seaborn
# with sns.axes_style("whitegrid"):
#     fig, axis = plt.subplots(1, 2, figsize=(20, 5), sharey=True)
#     fig.suptitle(f'Position distribution on splits')
#     sns.boxplot(ax=axis[0], data=df_prep, y='event', x='ith_pos', order=event_label_map.values())
#     sns.boxplot(ax=axis[1], data=pd.read_pickle(path.join(DATA_PATH, "stage2_test.pkl") ), y='event', x='ith_pos', order=event_label_map.values())

## Local library import
We import all the required local libraries libraries

In [4]:
# Include local library paths
import sys
# sys.path.append('path/to/local/lib') # uncomment and fill to import local libraries
# add project folders so local libraries can be imported
sys.path.insert(1, os.path.join(os.getcwd()  , '..'))
sys.path.insert(1, os.path.join(os.getcwd()  , '../src'))

# Import local libraries
# from plibs.utils import corrstats
# from src.plibs.utils import plots as myplots

In [5]:
# notebook misc functions
def pretty_print(df):
    return display( HTML( df.to_html().replace("\\n","<br>") ) )

def displayAll(df):
    with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.max_colwidth', None):	
        display(df)  

# Parameter definition
We set all relevant parameters for our notebook. By convention, parameters are uppercase, while all the 
other variables follow Python's guidelines.

In [6]:
# *** parameters for GUILD tracked projects ****
%env GUILD_HOME=../tracking/.guild
#os.environ['GUILD_HOME']
from guild import ipy
GUILD_HOME = ipy.guild_home()

# RUN_ID = "df2d52de187f4ed0ba5c1f10ed7bfab7" # example
# #RUN_PATH = f"{ipy.guild_home()}/runs"
# RUN_PATH = f"{ipy.guild_home()}/runs/{RUN_ID}"
# os.listdir(RUN_PATH)

def run_path(runid) -> str:
    return f"{ipy.guild_home()}/runs/{runid}"

def run_files(runid):
    return os.listdir(run_path(runid))

env: GUILD_HOME=../tracking/.guild



# Data import
We retrieve all the required data for the analysis.

## original test data

In [11]:
df_original = pd.read_csv("../data/input/task2/official_test/test.tsv", sep='\t')
print(df_original.shape)
display(df_original.head())
display(df_original.tail())

(9955, 3)


Unnamed: 0,id,claim,text
0,1307558525371965442,school closures,@narendramodi @rajnathsingh Student ka bhi soa...
1,1247739239879467009,stay at home orders,—échale un vistazo a esto… … a fair piece on...
2,1242046510155653125,stay at home orders,Why do think skilling women and girls is impor...
3,1358446499949084675,school closures,To reduce the risk of the virus spreading as e...
4,1249740062775902208,stay at home orders,I speak for a great many people when i say WE ...


Unnamed: 0,id,claim,text
9950,1242516037628813314,stay at home orders,StayAtHomeSaveLives 21daysLockdown StayAtHome ...
9951,1242746919933415424,stay at home orders,If this is true this is heartbreaking StayAtHo...
9952,1276638598813679617,stay at home orders,"855 Sunset Cove Dr, Winter Haven, FL 33880 3 B..."
9953,1243504288661270528,stay at home orders,StayAtHomeSaveLives StayHomeStaySafe StayHome ...
9954,1237875841981247488,school closures,We’re on track to be like Italy is now. Time t...


## predicted data

In [12]:
df = pd.read_csv(path.join(run_path("febedf66f2394b7191958ae1d524588b"), "predictions.tsv"), sep='\t')
print(df.shape)
display(df.head())
display(df.tail())

(9955, 13)


Unnamed: 0.1,Unnamed: 0,id,claim,text,tweet_text,tweet_text_clean,tgt_lang,Claim2,logits_0,logits_1,logits_2,yhat,yhat_label
0,0,1307558525371965442,school closures,@narendramodi @rajnathsingh Student ka bhi soa...,@narendramodi @rajnathsingh Student ka bhi soa...,@USER @USER Student ka bhi soach lijiye sahab...,True,Schools need to remain closed.,0.017222,0.037237,0.945541,2,NONE
1,1,1247739239879467009,stay at home orders,—échale un vistazo a esto… … a fair piece on...,—échale un vistazo a esto… … a fair piece on...,—échale un vistazo a esto… … a fair piece on c...,True,Stay at home is a needed measure.,0.372164,0.150666,0.477169,2,NONE
2,2,1242046510155653125,stay at home orders,Why do think skilling women and girls is impor...,Why do think skilling women and girls is impor...,Why do think skilling women and girls is impor...,True,Stay at home is a needed measure.,0.004298,0.007862,0.98784,2,NONE
3,3,1358446499949084675,school closures,To reduce the risk of the virus spreading as e...,To reduce the risk of the virus spreading as e...,To reduce the risk of the virus spreading as e...,True,Schools need to remain closed.,0.015017,0.947245,0.037738,1,FAVOR
4,4,1249740062775902208,stay at home orders,I speak for a great many people when i say WE ...,I speak for a great many people when i say WE ...,I speak for a great many people when i say WE ...,True,Stay at home is a needed measure.,0.926335,0.03229,0.041375,0,AGAINST


Unnamed: 0.1,Unnamed: 0,id,claim,text,tweet_text,tweet_text_clean,tgt_lang,Claim2,logits_0,logits_1,logits_2,yhat,yhat_label
9950,9950,1242516037628813314,stay at home orders,StayAtHomeSaveLives 21daysLockdown StayAtHome ...,StayAtHomeSaveLives 21daysLockdown StayAtHome ...,StayAtHomeSaveLives 21daysLockdown StayAtHome ...,True,Stay at home is a needed measure.,0.009671,0.944242,0.046087,1,FAVOR
9951,9951,1242746919933415424,stay at home orders,If this is true this is heartbreaking StayAtHo...,If this is true this is heartbreaking StayAtHo...,If this is true this is heartbreaking StayAtHo...,True,Stay at home is a needed measure.,0.006706,0.962074,0.031219,1,FAVOR
9952,9952,1276638598813679617,stay at home orders,"855 Sunset Cove Dr, Winter Haven, FL 33880 3 B...","855 Sunset Cove Dr, Winter Haven, FL 33880 3 B...","855 Sunset Cove Dr, Winter Haven, FL 33880 3 B...",True,Stay at home is a needed measure.,0.094186,0.68045,0.225364,1,FAVOR
9953,9953,1243504288661270528,stay at home orders,StayAtHomeSaveLives StayHomeStaySafe StayHome ...,StayAtHomeSaveLives StayHomeStaySafe StayHome ...,StayAtHomeSaveLives StayHomeStaySafe StayHome ...,True,Stay at home is a needed measure.,0.01212,0.932844,0.055036,1,FAVOR
9954,9954,1237875841981247488,school closures,We’re on track to be like Italy is now. Time t...,We’re on track to be like Italy is now. Time t...,We’re on track to be like Italy is now. Time t...,True,Schools need to remain closed.,0.004402,0.964637,0.030961,1,FAVOR


# Data processing
Put here the core of the notebook. Feel free to further split this section into subsections.

# References
We report here relevant references:
1. author1, article1, journal1, year1, url1
2. author2, article2, journal2, year2, url2