# Rainy Days and Mondays Always Get Them Down
## Data Science Module 2 Final Lab
#### Tino Pietrassyk & David Haase (NYC-MHTN-DS-042219)
Typically, when you are writing something that's technical, you want a headline that's appealing. This will help you to reach a wider audience and let your work have more visibility and impact as well. The headline or topics of your tutorial should include a short title (typically 10~15 maximum words). The title can include the technology and technique you are talking about, combined with a nice action verb that appeals to a wider audience. For instance, I titled one of my articles as follows: "Building RNNs is Fun with PyTorch and Google Colab". You can get pretty creative with the headline, but keep in mind that this is the face of your article and it should be given plenty of tinkering before you finalize it.

## Scraping Weather & Sports Data into MongoDB
The project description marks the beginning of the tutorial you are writing. It should be clear, concise, and interesting. Here I suggest you to briefly explain what the following notebook tutorial does (usually one sentence)? Then you can explain what technologies you will be using in the tutorial (usually one sentence)? You can also briefly explain what value or knowledge the user will obtain after finishing the tutorial (a short list or two sentences will do)? In addition, you can give credit to any of the notable resources you are utilizing, and also briefly introduce yourself if the project description is not too lengthy?

In [34]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import datetime
import re
import pymongo

### Resources & Credits
#### Dark Sky
Implemented with the free version of **Dark Sky API** (https://darksky.net/poweredby/)
* Key: 9e82e12479d0f36289a78ea12a5cf8af

#### Project Template
Written with ❤️ by dair.ai

## [Data Loading]
The first step of the pipeline with any data science related tutorial is usually the data loading component. Besides visually describing the dataset in use to your audience, also try to briefly explain (in one or two sentences) where the data came from, i.e., the source of the data. Other specifications like dimensions and attribute type are important but can be neatly explained with examples using code and tools such as pandas.

## [Data Preprocessing]
Although sometimes not necessary, as some datasets already come preprocessed, I believe it is important to slightly mention what type of preprocessing steps the data has undergone -- even if you need to do this through code examples. It should clarify any confusion that can present itself during the modeling section of the tutorial. Remember, your audience wants to get a broad understanding of the data before the modeling component of the tutorial, so try to explain this part of the tutorial as clear as possible with examples. Take advantage of your notebook features and other tools such as matplotlib and pandas.

In [69]:
class WeatherGetter:
    def __init__(self):
        # API key from Dark Sky
        #   https://darksky.net/dev/docs#time-machine-request
        self.api_key = '9e82e12479d0f36289a78ea12a5cf8af'

    def is_rain(self, day, city):

        # LOCATION
        # Use a Google Geocoding API
        g_key = 'AIzaSyDwqL2hBlyDmlB51IYF8dXXnBtbzFVZQ5E'
        try:
            response = requests.get(f'https://maps.googleapis.com/maps/api/geocode/json?address={city}&key={g_key}')
        except Exception as e:
            print(e)
        
        r = response.json()
        if r['status'] != 'OK':
            return 'City not valid'
        city_echo = r['results'][0]['address_components'][0]['long_name'] + ', ' + r['results'][0]['address_components'][2]['long_name']
        lat, long = r['results'][0]['geometry']['location']['lat'],  r['results'][0]['geometry']['location']['lng']
        
#         lat = 52.520008
#         long = 13.404954

        # DATE
        # try to find the date from the string and use
        date_patt = re.compile(r'(\d{1,2})/(\d{1,2})/(\d{4})')
        match = date_patt.search(day)
        if (match):
            m, d, y = match.group(1), match.group(2), match.group(3)
            t = datetime.datetime(int(y), int(m), int(d))
        else:
            return ('Invalid Date: M/D/YYYY')

        # EXCLUDEminimize the return JSON from the API by using the exclude list because the API lets us
        exclude = 'currently,hourly'
        query = f'https://api.darksky.net/forecast/{self.api_key}/{lat},{long},{t.strftime("%s")}?exclude={exclude}'

        try:
            r = requests.get(query)
        except Exception as e:
            print(e)

        # PARSE FOR RAIN:  The Expected keys = dict_keys(['latitude', 'longitude', 'timezone', 'daily', 'flags', 'offset'])
        # We want to see if the string 'rain' exists in the ['data']['daily']['summary'], or 'data']['daily']['icon']
        summary, icon = r.json()['daily']['data'][0]['summary'], r.json()['daily']['data'][0]['icon']
        if ('rain' in summary.lower()) or ('rain' in icon.lower()):
            print(summary, icon, city_echo)
            return True
        else:
            print(summary, icon, city_echo)
            return False


w = WeatherGetter()
w.is_rain('07/01/2012', 'Southampton, New York')



Mostly cloudy in the evening. partly-cloudy-day Southampton, Suffolk County


False

## [Writing to the Database]
We are using our local instance of mongodb

In [27]:
# CONNECT -- build a connection to the local instance of MongoDB
address = 'mongodb://127.0.0.1:27017/'
db_name = 'euro_football'
collection_name = 'year_2011'

myclient = pymongo.MongoClient(address)
db = myclient[db_name]
mycollection = db[collection_name]

# Scratch test with dummy CLASS
class Team:
    def __init__(self, name):
        self.name = name
    
    def to_json(self):
        ret_val = {}
        ret_val['name'] = self.name
        return ret_val

d = Team('David')
t = Team('Tino')
teams = [d,t]

# INSERT -- try to inserte a record for each team
try:
    insertion_results = mycollection.insert_many([team.to_json() for team in teams])
except Exception as e:
    print(e)

# SUM-CHECK -- compare results of insertion
print(f'{len(insertion_results.inserted_ids)} teams out of {len(teams)} inserted into {db_name}')

2 teams out of 2 inserted into euro_football


## [Testing Model]
One of the things I have learned over the years is that everything in data science is better understood with examples, rather than just using plain code or pictures. Before you begin training your models make sure to explain to the reader what the model is expecting as input and what it is expected to output. Rendering code here with nice descriptions help to prepare the reader on what to expect during training the model, especially since the training code is usually longer than most sections of the tutorial. With libraries like PyTorch and DyNet this is fairly easy since they are dynamic computing libraries. TensorFlow also offers an eager execution command, tf.enable_eager_execution() to evaluate operations immediately. This is what's called imperative programming and I am glad they have it. It makes it easy to teach others about the beautiful things these tools are able to accomplish. I like to think that data science is about storytelling and discovery, and it should remain that way. Clear writing helps!
## [Training Model]
When training the models you would specify what kind of optimization, hyperparameters, and data iterating methods you are using. To be honest, the training code is usually self-explanatory. If you did your job at the beginning, explaining your dataset and testing the model, this part of the tutorial is probably the one that needs less explanation. In my experience, most data computing libraries use similar training strategies, thus the training structure has become ubiquitous in some sense. If there is still any clarification in your training that you need the reader to know, you can always explain it beforehand.

## [Evaluating Model]
And lastly, it is good practice to evaluate your models on some held out samples of the dataset. This helps the reader to get a gist of what the tutorial you just showed him/her contains. It also helps to re-emphasize on the values the tutorial is providing for the reader. This part of the tutorial also helps to finalize your final thoughts and share insights with your readers. Readers love insights. You can share plots, a lot of examples, and even explore the parameters of the model.

## [Final Words]
You are not writing a book, so it is not necessary to have a conclusion section. In my experience, you use the final section to summarize all your findings and the future ideas you are working on. This is also a great time to congratualte the reader for making it to the end of the tutorial -- that's a huge achievement. You show that you appreciate the readers. Then you can end the section with your favorite quote.

And that's it! Congratulations for reaching the end of this primer. You are now more than equipped to deliver excellent tutorials to the whole data science community and to a wider audience. With this short primer, you should reach thousands, and hopefully millions, but most importantly, with it, you should be able to bring value to your readers and keep expanding the human knowledge base.

## [References]
Remember to always give credit where it is due. It shows you are responsible and care for the long-term success of the community. Papers, other implementations, video, code repositories, etc., are some of the things are you looking to reference. If you don't want to include this very formal reference section, make sure to embed links throughout the tutorial as an alternative.

Written with ❤️ by dair.ai

## [Other Tips]
Try to ensure that your notebook-based tutorials have a very nice flow. If you are using a lot of functions, it will be nice if you can create seperate python files for them and import them here. You don't want your notebooks to be too detailed, but you also don't want it to be too flat.
Remember! You are teaching not dictating. Ask questions and immerse the reader, challenge them. There are various ways to do so.
More coming soon!