# SOCIETAL FACTORS (20 mins)
# Instructions / Notes: Read these carefully
This **Python Jupyter Notebook** notebook is split into the following sections:

1. **Initial section** with pre-filled cells, that you should run just to load some Python modules (packages), the dataset required for your task and its variables in memory.
2. **Middle section** with **description of a concrete task** associated with the dataset, and some **ideas** related to the task that you may choose to work with. 
3. **Final section (with one or more empty cells)** where you can perform analyses with the loaded dataset (e.g., write a few lines of code if needed), answer the question posed, and describe your reasoning in words.

**Read and execute each cell in order, without skipping forward**. To execute any cell, press **Shift+Enter** on your keyboard. It might take a couple of seconds to receive an output. 

Have fun!

In [1]:
# Run the following to import necessary packages and import dataset. 
import sys
import pandas as pd
import numpy as np
import scipy as sp
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from othersSC_SD import different_idea
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
matplotlib.style.use('ggplot')

d1 = "SC_dataset.csv"
df1 = pd.read_csv(d1)
#Print the dataset to see if is loaded properly.
df1

Unnamed: 0,Movie_theatres,Emigrants,Train_transport,Name_Peter
0,165,43980,90564,271
1,162,43481,86842,223
2,166,43466,88374,204
3,163,45017,89575,200
4,165,45869,90456,148
5,165,46786,90407,162
6,167,42708,88615,145
7,164,48171,91111,128
8,163,51988,92146,119
9,162,52097,92995,120


# DATASET DESCRIPTION
The dataset above contains various societal factors in **country 1** during a timespan of 2001 to 2015:
1. **Number of movie theaters**
2. **Number of emigrants (people leaving the country)** 
3. **Number of people transported by trains**
4. **Number of newborns given the name Peter** 

# TASK
Design **as many measures** to identify **strong AND meaningful** relationships among pairs of columns in this dataset. Your rationale should be based on consideration of every data point in the dataset. We expect you to **generate multiple measures**.

For **each measure that you design**: 

1. Please mark which relationships in the dataset are strong and meaningful (using yes/no)
2. Please provide a brief **reasoning** behind your answer (an explanation of **why** you took certain steps or performed certain calculations to get to the solution)
3. Please mark your **confidence** in the designed measure (on a scale of 1 to 5)
4. Please mark how many **ideas did you request** so far to develop your measure (e.g., 0, 1, 2 etc)

**MAKE SURE** to fill all four fields for each measure.

# IDEA:
**Read more about statistical dependence among variables here: https://en.wikipedia.org/wiki/Correlation_and_dependence**. 

# Important note about the idea
You may choose to work (or, not work) with this idea in developing your measure. In case the idea information is not helpful and you are not sure if/how you might design new measures (or, revise measures you already designed), you can ask for a different idea by typing **different_idea ("SC", "SD1")** in the code cell below the template cell.


In [None]:
#Template for designing a measure. Make copies of this template cell to create as many measures as you are able to (within the allotted time).

#NOTE: Round all your statistics to 2 decimal places before reasoning with them!! 

#REPORT YOUR ANSWER (e.g., yynnyy, yyynnn, ynynyy etc)
strong_and_meaningful_relationships = 'nnyynn'
#Choose "y" for YES and "n" for NO for the 6 pairs of dataset relationships in the following sequence:
#MovieTheatres-Emigrants --> MovieTheatres-TrainTransport --> MovieTheatres-NamePeter --> Emigrants-TrainTransport --> Emigrants-NamePeter -->TrainTransport-NamePeter 
print(strong_and_meaningful_relationships)

#REPORT YOUR REASONING
relationships_reasoning = 'Compute the correlation distances using scipy function. And take the lower distances'
print(relationships_reasoning)

#REPORT CONFIDENCE IN YOUR SOLUTION
confidence_measure = '2' 
#Choose among: 1 (low confidence), 2, 3 (medium confidence), 4, 5 (high confidence)
print(confidence_measure)

#REPORT A COUNT OF IDEAS REQUESTED SO FAR TO DEVELOP YOUR SOLUTION
ideas_asked_so_far_measure = '1'
#Choose among: 0 (Did not use the provided idea) 1 (Only used the provided idea), 2 (Asked one additional idea), 3 (Asked two additional ideas)
print(ideas_asked_so_far_measure)


In [4]:
#ONLY use this space below to write your code (if needed) for any measure you generate. DO NOT ERASE this code segment from the workbook.

#IF YOU WANT TO ASK FOR A DIFFERENT IDEA, UNCOMMENT THE LINE BELOW, AND JUST RE-RUN THIS CELL
#different_idea("SC","SD1")


sp.spatial.distance.correlation(df1['Movie_theatres'], df1['Emigrants'])
sp.spatial.distance.correlation(df1['Movie_theatres'], df1['Train_transport'])
sp.spatial.distance.correlation(df1['Movie_theatres'], df1['Name_Peter'])

sp.spatial.distance.correlation(df1['Emigrants'], df1['Train_transport'])
sp.spatial.distance.correlation(df1['Emigrants'], df1['Name_Peter'])
sp.spatial.distance.correlation(df1['Train_transport'], df1['Name_Peter'])





#Your intuitive ideas are valuable!!If you need syntax-related help in implementing your ideas, you can access the following documentation files (use the "Search" tab for queries) and/or summarized syntax sheets.

#a) Pandas library
#Documentation file: https://pandas.pydata.org/pandas-docs/stable/
#Syntax sheet: https://datacamp-community-prod.s3.amazonaws.com/fbc502d0-46b2-4e1b-b6b0-5402ff273251

#b) Numpy library
#Documentation file: https://docs.scipy.org/doc/numpy/user/index.html
#Syntax sheet: https://datacamp-community-prod.s3.amazonaws.com/e9f83f72-a81b-42c7-af44-4e35b48b20b7

#c) Matplotlib library
#Documentation file: https://matplotlib.org/contents.html
#Syntax sheet: https://datacamp-community-prod.s3.amazonaws.com/28b8210c-60cc-4f13-b0b4-5b4f2ad4790b

#d) Scipy library
#Documentation file: https://docs.scipy.org/doc/scipy/reference/
#Syntax sheet: https://datacamp-community-prod.s3.amazonaws.com/5710caa7-94d4-4248-be94-d23dea9e668f

'Read more about correlation and causation here: https://en.wikipedia.org/wiki/Correlation_does_not_imply_causation. If this idea changes measures you designed earlier or inspires you to create a new measure, try again. In case the idea information is not helpful and you are not sure if/how you might design new measures (or, revise measures you already designed), you can ask for a different idea by calling the function different_idea("SC","SD2").'

1.8034250982844886

1.7084054458333857

0.4173730137773142

0.08653021375769054

1.884907289753981

1.7963703520021377