![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Callysto's Weekly Data Visualization
## WeRateDogs inflationary scoring

### Reccommended grade level: 10-12

Callysto's Weekly Data Visualization is a learning resource that helps Grades 5-12 teachers and students grow and develop data literacy skills. We do this by providing a data visualization, like a graph, and asking teachers and students to interpret it. This companion resource walks learners through how the data visualization is created and interpreted using the data science process. The steps of this process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer?
2. Gather - Find the data source(s) you will need.
3. Organize - Arrange the data so that you can easily explore it.
4. Explore - Examine the data to look for evidence to answer our question. This includes creating visualizations.
5. Interpret - Explain how the evidence answers our question.
6. Communicate - Reflect on the interpretation.

## 1. Question

## 2. Gather

In [1]:
import numpy as np
import pandas as pd
import re
import plotly.graph_objects as go
import plotly.express as px
import os

In [2]:
path = os.path.join('datasets', 'dog_rates_tweets.csv')
data = pd.read_csv(path, dtype ={'text':str},parse_dates = ['created_at']).set_index(keys='id')

## 3. Organize

In [3]:
def text_to_rating(text):
    match = re.search(r'(\d+(\.\d+)?)/10',str(text))
    if match:
        top, btm = match[0].split('/')
        return float(top)/float(btm)
    else:
        return None

In [4]:
data['rating']= data['text'].map(text_to_rating)
#drop the actual content of the tweets
data =data.drop(['text'], axis =1)
#drop the 0/10 scores found
data = data[np.isfinite(data['rating'])]
#rename create_at
data.rename(columns={'created_at':'date created'}, inplace=True)

In [24]:
data = data.sort_values(by='rating', ascending = False)
data.head()

Unnamed: 0_level_0,date created,rating
id,Unnamed: 1_level_1,Unnamed: 2_level_1
749981277374128128,2016-07-04 15:00:45,177.6
855860136149123072,2017-04-22 19:05:32,66.6
855862651834028034,2017-04-22 19:15:32,42.0
912115536229752832,2017-09-25 00:44:24,42.0
912103168309432320,2017-09-24 23:55:16,42.0


In [25]:
#remove outliers
data_cleaned = data.iloc[13:].copy()

In [26]:
data_cleaned.head()

Unnamed: 0_level_0,date created,rating
id,Unnamed: 1_level_1,Unnamed: 2_level_1
1056612914583367680,2018-10-28 18:25:24,1.4
1067836167595409408,2018-11-28 17:42:36,1.4
873697596434513921,2017-06-11 00:25:14,1.4
1045058704138280960,2018-09-26 21:13:05,1.4
1073035054782300160,2018-12-13 02:01:07,1.4


## 4. Explore

In [19]:
def create_plot(data):
    """A simple function that will take in a dataframe formated like ours 
    and produce a scatterplot with best fit line
    input:
        a pandas dataframe with 'rateing' and 'date created columns
    return:
        a plotly express scatterplot with best fit line(untitled)
    """
    fig = px.scatter(data, x='date created', y='rating', trendline='ols')
    #highlight the best fit line in red to make it more visible
    fig.data[1].update(line_color='red')
    #show the tweets in the legend
    fig['data'][0]['showlegend']=True
    fig['data'][0]['name']='Tweet'
    # show the best fit line in the legend
    fig['data'][1]['showlegend']=True
    fig['data'][1]['name']='Best Fit Line (OLS)'
    fig.update_layout(showlegend=True)
    #show the plot
    return fig

In [21]:
fig = create_plot(data)
fig.update_layout(title = 'Plot without outliers removed')
fig.show()

In [23]:
fig = create_plot(data_cleaned)
fig.update_layout(title = 'WeRateDogs scores given versus time')
fig.show()

## 5. Interpret

## 6. Communicate

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)