In [None]:
%matplotlib inline
import matplotlib
import seaborn as sns
matplotlib.rcParams['savefig.dpi'] = 2 * matplotlib.rcParams['savefig.dpi']

# How to (Software) Engineer Real Good

The goal of this lecture is to provide a brief understanding of some of the most important concepts in good engineering practice

- Be functional
- Version Control/Learn your tools
- Testing
- Writing Good Code
- Collaboration/Code Review
- Time Management/"Technical Debt"

## Writing Functional Code

There are two major paradigms for writing code.  **Imperative** and **Functional**.  In Imperative programming, you are telling the computer what to do.  In Funcitonal, you are telling the computer how to do it.  Imperative code is highly stateful.  Functional code is not stateful.  The closer you are to the hardware, the more imperative your code is -- in general imperative code is going to be more performant.  The higher up the stack you go, the more functional your code should be.  C, C++, and Java are much more imperative.  Haskel, SML, and Scala are more functional.  Python is in between but the functional aspects tend to be less error prone.  Here are a few common mistakes:

- Don't index into lists when iterating
- Don't build lists using `extend` or `append`
- Avoid mutating state: it's a bad idea in general.  If you must, do so inside a class or a generator.

In [None]:
# How would you improve this?
l = range(5)
for i in range(len(l)):
    print l[i]

In [None]:
# How would you improve these?
l1 = range(5)
l2 = []
s = set([])
d = {}

for x in l1:
    l2.append(str(x))
    s.add(str(x))
    d.update({x: str(x)})

In [None]:
# How would you improve this?
l1 = range(10)
l2 = []
    
for x in l1:
    if x % 3 == 0:
        l2.extend([str(x), str(x)])
    
print l2

In [None]:
# How do you clean up the code
import numpy as np
l = range(10)

m0 = 0
m1 = 0.
m2 = 0.
for x in l:
    m0 += 1
    m1 += x
    m2 += x * x
    
print np.sqrt(m2 / m0 - (m1 / m0) * (m1 / m0))

In [None]:
import numpy as np

# You can clean up the code this way
class Std(object):
    def __init__(self):
        self._m0 = 0
        self._m1 = 0.
        self._m2 = 0.
        
    def add(self, x):
        self._m0 += 1
        self._m1 += x
        self._m2 += x * x
            
    def std(self):
        return np.sqrt(self._m2 / self._m0 - (self._m1 / self._m0) * (self._m1 / self._m0))
        
s = Std()
for x in l:
    s.add(x)
print s.std()

## Version control and other tools

Git is arguably the single most important tool you'll use in your time writing software. You should spend time learning the tools you use and setting up a workflow that is repeatable across projects. This will save you time and make you more productive in the long term.

HOW TO: There is **tons** of literature online for any tool you may use. For example, [here's a great article on understanding git](http://www.sbf5.com/~cduan/technical/git/). When collaborating with others, you'll run into merge conflicts - [never fear](https://css-tricks.com/deal-merge-conflicts-git/); plenty of articles exist on dealing with those. Whenever you need to pick up a new tool, spend a bit of time researching it before diving in. That little extra effort goes a long way to improving your baseline productivity.


Useful git commands you may not have heard of:

- `git bisect` -- determine exactly what change introduced a bug
- `git diff` -- diff the changes you've made against your commit history
- `git grep` -- search all files tracked in the repository
- `git log` -- list the "edited" history of your branch
- `git reflog` -- a 'real-time' history of the state of HEAD in your repo
- `git blame` -- tells you who wrote each line of code
- `git stash` -- 'stash away' the current changes to put out a fire, `git stash pop` to bring them back


The other most important tool at your disposal is your text editor. Get really good at using it. Take some time to Google 'useful {vim, sublime, emacs} commands'. Pick up a command-line-based editor - it's a really useful skill. Our remote workflow doesn't exist in a vacuum...

## Testing
Test, test, test. No, seriously. Also, in general, unit tests make more sense in `*.py` files, not in `*.ipynb`.  We're writing these in this file to make them easier to show.  These tests should be in a python file.

### Different Kinds of Tests

- Unit
- Integration
- Acceptance (end-to-end)

*Unit Tests* confirm that one discrete, logical unit of code is working. External dependencies are "mocked" in unit tests.
*Integration Tests* confirm that several logical parts of a system work correctly and interact in the way they should.
*End-to-End Tests* confirm that the complete software package / application does what it's supposed to do under a variety of use cases.

### So, what does this look like?
Let's generate an example. How would we:

- unit test
- integration test
- end-to-end test

the below system? What kinds of things would we want to check?

In [None]:
import json, sys, requests
import simplejson
from requests_oauthlib import OAuth1

def get_100_tweets():
    """Returns 100 of the most recent tweets from Twitter's API."""
    auth = get_twitter_auth()
    return get_twitter_data(auth)


def get_twitter_auth():
    """
    Returns a Twitter auth object, by which we can access the api.
    """

    with open("secrets/twitter_secrets.json.nogit") as fh:
        secrets = simplejson.loads(fh.read())

    # create an auth object
    auth = OAuth1(
        secrets["api_key"],
        secrets["api_secret"],
        secrets["access_token"],
        secrets["access_token_secret"]
    )

    return auth

def get_twitter_data(auth):
    """ Pulls some data from twitter's sample streaming API
    
    Args:
        auth: An OAuth1 object

    Returns: 
        100 of the most recent 'sample' tweets.
    """
    #TODO: get some twitter data, use requests
    r_stream = requests.get(
        'https://stream.twitter.com/1.1/statuses/sample.json',
        auth=auth, stream=True
    )
    counter = 0
    tweets = []
    for line in r_stream.iter_lines():
        # filter out keep-alive new lines
        if not line:
            continue
        
        tweet = json.loads(line)
        
        #only want substantive tweets
        if 'text' not in tweet:
            continue
        
        tweets.append(tweet)
        
        counter +=1
        if counter > 100:
            break
    
    return tweets


### The `unittest` and `mock` modules
`unittest` is the basis for almost all testing in Python. Other libraries/technologies built on Python (e.g. Flask) may have their own test handlers, but they ultimately plug in to `unittest`. This is nice because we have a common interface for testing pretty much any Python we write.

Other languages have possibly many of their own testing frameworks - in practice you'll either use what's already there, or just pick the most stable/established one and stick to it.

Similarly, mocking is a broader concept across software development, incarnate in the `mock` library for python2 (it's a default in python3). Basically, for tests, we want to stub out dependencies and external APIs, *assume* they return what we expect them to - and then have another test which confirms that the API still returns the format we expect. 

In the example below, we mock Twitter's API in the unit test (using another library, [responses](https://github.com/getsentry/responses), that mocks the `requests` library for us). We grabbed some recent data from Twitter's API to serve as a local constant for the expected response format.

The integration test confirms the format of the response. (We could actually do a better job of this. How?)

In [None]:
import unittest
import responses

#contains one nice JSON response and one undesirable
SAMPLE_JSON_RESPONSES = [{u'contributors': None,
    u'coordinates': None,
    u'created_at': u'Thu Jul 16 21:27:35 +0000 2015',
    u'entities': {u'hashtags': [{u'indices': [29, 43],
         u'text': u'wildfiresteak'},
        {u'indices': [44, 54], u'text': u'steaktime'},
        {u'indices': [55, 69], u'text': u'chicagocraves'}],
     u'media': [{u'display_url': u'pic.twitter.com/g5w2459y4S',
         u'expanded_url': u'http://twitter.com/WildfireRest/status/621793324849086464/photo/1',
         u'id': 621793324010094592,
         u'id_str': u'621793324010094592',
         u'indices': [93, 115],
         u'media_url': u'http://pbs.twimg.com/media/CKENu9lUwAA1_zc.jpg',
         u'media_url_https': u'https://pbs.twimg.com/media/CKENu9lUwAA1_zc.jpg',
         u'sizes': {u'large': {u'h': 1462, u'resize': u'fit', u'w': 1024},
            u'medium': {u'h': 857, u'resize': u'fit', u'w': 600},
            u'small': {u'h': 485, u'resize': u'fit', u'w': 340},
            u'thumb': {u'h': 150, u'resize': u'crop', u'w': 150}},
         u'type': u'photo',
         u'url': u'http://t.co/g5w2459y4S'}],
     u'symbols': [],
     u'trends': [],
     u'urls': [{u'display_url': u'goo.gl/1VIuaR',
         u'expanded_url': u'http://goo.gl/1VIuaR',
         u'indices': [70, 92],
         u'url': u'http://t.co/DC2M278kU5'}],
     u'user_mentions': []},
    u'extended_entities': {u'media': [{u'display_url': u'pic.twitter.com/g5w2459y4S',
         u'expanded_url': u'http://twitter.com/WildfireRest/status/621793324849086464/photo/1',
         u'id': 621793324010094592,
         u'id_str': u'621793324010094592',
         u'indices': [93, 115],
         u'media_url': u'http://pbs.twimg.com/media/CKENu9lUwAA1_zc.jpg',
         u'media_url_https': u'https://pbs.twimg.com/media/CKENu9lUwAA1_zc.jpg',
         u'sizes': {u'large': {u'h': 1462, u'resize': u'fit', u'w': 1024},
            u'medium': {u'h': 857, u'resize': u'fit', u'w': 600},
            u'small': {u'h': 485, u'resize': u'fit', u'w': 340},
            u'thumb': {u'h': 150, u'resize': u'crop', u'w': 150}},
         u'type': u'photo',
         u'url': u'http://t.co/g5w2459y4S'}]},
    u'favorite_count': 0,
    u'favorited': False,
    u'filter_level': u'low',
    u'geo': None,
    u'id': 621793324849086464,
    u'id_str': u'621793324849086464',
    u'in_reply_to_screen_name': None,
    u'in_reply_to_status_id': None,
    u'in_reply_to_status_id_str': None,
    u'in_reply_to_user_id': None,
    u'in_reply_to_user_id_str': None,
    u'lang': u'en',
    u'place': None,
    u'possibly_sensitive': False,
    u'retweet_count': 0,
    u'retweeted': False,
    u'source': u'<a href="http://www.hootsuite.com" rel="nofollow">Hootsuite</a>',
    u'text': u'When in doubt, filet it out. #wildfiresteak #steaktime #chicagocraves http://t.co/DC2M278kU5 http://t.co/g5w2459y4S',
    u'timestamp_ms': u'1437082055662',
    u'truncated': False,
    u'user': {u'contributors_enabled': False,
     u'created_at': u'Mon Apr 27 21:15:51 +0000 2009',
     u'default_profile': False,
     u'default_profile_image': False,
     u'description': u'Steaks, chops and seafood restaurant in a modern-day 1940s dinner club. Join us on Facebook: http://www.facebook.com/wildfirerestaurant',
     u'favourites_count': 1028,
     u'follow_request_sent': None,
     u'followers_count': 8576,
     u'following': None,
     u'friends_count': 6805,
     u'geo_enabled': False,
     u'id': 35863677,
     u'id_str': u'35863677',
     u'is_translator': False,
     u'lang': u'en',
     u'listed_count': 592,
     u'location': u'Chicago, Twin Cities, DC',
     u'name': u'Wildfire Restaurant',
     u'notifications': None,
     u'profile_background_color': u'3D0A03',
     u'profile_background_image_url': u'http://pbs.twimg.com/profile_background_images/728505010/6f8baaa2a6f5f852b8d7e5242dbf1ab6.jpeg',
     u'profile_background_image_url_https': u'https://pbs.twimg.com/profile_background_images/728505010/6f8baaa2a6f5f852b8d7e5242dbf1ab6.jpeg',
     u'profile_background_tile': False,
     u'profile_banner_url': u'https://pbs.twimg.com/profile_banners/35863677/1416527021',
     u'profile_image_url': u'http://pbs.twimg.com/profile_images/1258873654/Cut_Filet_Mignon_normal.jpg',
     u'profile_image_url_https': u'https://pbs.twimg.com/profile_images/1258873654/Cut_Filet_Mignon_normal.jpg',
     u'profile_link_color': u'080101',
     u'profile_sidebar_border_color': u'FFFFFF',
     u'profile_sidebar_fill_color': u'690A0F',
     u'profile_text_color': u'050501',
     u'profile_use_background_image': True,
     u'protected': False,
     u'screen_name': u'WildfireRest',
     u'statuses_count': 18852,
     u'time_zone': u'Central Time (US & Canada)',
     u'url': u'http://wildfirerestaurant.com',
     u'utc_offset': -18000,
     u'verified': False}},
 {u'delete': {u'status': {u'id': 476806989751021568,
        u'id_str': u'476806989751021568',
        u'user_id': 425044801,
        u'user_id_str': u'425044801'},
     u'timestamp_ms': u'1437082056206'}}]

class TwitterAuthUnitTest(unittest.TestCase):
        
    #this runs before each test in the class, set state here
    def setUp(self):
        pass
    
    #this runs after each test
    def tearDown(self):
        pass

    #default test method runner name
    #there are other ways to load tests for running
    def runTest(self):
        auth_obj = get_twitter_auth()
        self.assertTrue(isinstance(auth_obj, OAuth1))
        with open("secrets/twitter_secrets.json.nogit") as fh:
            secrets = simplejson.loads(fh.read())
        client = auth_obj.client
        
        self.assertEqual(client.client_key, secrets["api_key"])
        self.assertEqual(client.client_secret, secrets["api_secret"])
        self.assertEqual(client.resource_owner_key, secrets["access_token"])
        self.assertEqual(client.resource_owner_secret, secrets["access_token_secret"])

class GetTweetsUnitTest(unittest.TestCase):

    def setUp(self):
        pass
    
    def tearDown(self):
        pass

    @responses.activate
    def runTest(self):
        responses.add(responses.GET,
                      'https://stream.twitter.com/1.1/statuses/sample.json',
                      body="\n".join(150*[
                            simplejson.dumps(SAMPLE_JSON_RESPONSES[0]),
                            simplejson.dumps(SAMPLE_JSON_RESPONSES[1])
                      ]),
                      status=200,
                      content_type='application/json')
                                    
        resp = get_twitter_data(auth)
        #desired length
        self.assertEqual(len(resp), 100)
        #with only the desired content
        self.assertNotIn(SAMPLE_JSON_RESPONSES[1], resp)
        self.assertIn(SAMPLE_JSON_RESPONSES[0], resp)

In [None]:
class TwitterAuthIntegrationTest(unittest.TestCase):

    def setUp(self):
        pass
    
    def tearDown(self):
        pass

    def runTest(self):
        """
        Test we get a good and usable oauth object back
        """
        auth = get_twitter_auth()
        r = requests.get(
                "https://api.twitter.com/1.1/friends/ids.json",
                auth=auth, params={'screen_name' : 'tianhuil'}
        )
        self.assertEqual(r.status_code, 200)

        
class GetTweetsIntegrationTest(unittest.TestCase):
    def setUp(self):
        pass
    
    def tearDown(self):
        pass

    def runTest(self):
        """
        Test we get 100 good tweets back when actually calling 
        the twitter API.
        """
        auth = get_twitter_auth()
        data = get_twitter_data(auth)
        self.assertEqual(len(data), 100)
        for d in data:
            self.assertIn('text', d)
        

In [None]:
class Get100EndToEnd(unittest.TestCase):

    def setUp(self):
        pass
    
    def tearDown(self):
        pass

    def runTest(self):
        """
        Call function, inspect result to ensure it's what we want.
        """
        tweets = get_100_tweets()
        self.assertEqual(len(tweets), 100)
        for t in tweets:
            self.assertIn('text', t)

In [None]:
r = unittest.TestResult()
suite = unittest.TestSuite()
suite.addTests([Get100EndToEnd(), 
                GetTweetsIntegrationTest(), 
                TwitterAuthIntegrationTest(), 
                GetTweetsUnitTest(), 
                TwitterAuthUnitTest()])
suite.run(r)
print "==ERRORS=="
print r.printErrors()
print "==FAILURES=="
for test_class, tb in r.failures:
    print test_class.__class__.__name__
    print tb
    print "=========="

## Testing the web in Flask
Basically, here's how each of these things work in Flask:

- Unit tests: 
Check that each endpoint returns what you expect, check authorization logic, check any other units of backend logic you've implemented. Notably, in unit testing, the database is meant to be mocked.

- Integration tests:
Plug in a database, check endpoints do what they are expected to do and appropriately update the DB, and ensure endpoints/backend logic play nicely on different inputs. 

- End-to-end-tests:
Use PhantomJS to simulate a browser and someone clicking around your website, with a clean version of your app spun up in a subprocess.

### More words on testing

1. **How much testing should I write?**
    
    This is very situational and depends on your time constraints, the importance of the feature, how complex the system is, etc... There are many schools of thought. One such school is...

2. **Test-driven development (TDD)**
   
    TDD works by writing the spec for what you're building _first_, as a series of tests that fail, and then building out the feature/system to conform to that spec. It forces you to think about what you're doing before you dive in and write (possibly messy or poorly designed) code - and hopefully avoid the pitfalls of hacking away without some forethought.

3. **Other tests and gracefully failing**
   You'll encounter many situations where tests don't neatly fit into one of these 3 categories. For example, sometimes increased server load causes things to break. For this, you can write _load tests_ that create a mirrored production environment and slam it with requests. 

More generally, you should try to design the software you build with corner cases and graceful failure in mind. Don't just design for failures you _know_ will happen (e.g. a "Page / Record not found" error) - also think about catch-all exception handlers for failure that you _don't know_ about (and in practice, there are lots of these).

## Linting 
One other thing you should do when testing your software packages is lint your code. This means you run your code through a tool such as [pylint](http://www.pylint.org/), [eslint](https://github.com/eslint/eslint), or something similar. Your markup/styles can and should be linted as well (and plenty of tools exist; give it a Google).

In [None]:
# This won't pass pylint (why?)
def total(l):
    for i in l:
        total += i
    return total

In [None]:
# this function is bad (why?)

def fib(n):
    if n == 0:
        return 1
    if n == 1:
        return 1
    if n > 1:
        return fib(n-1) + fib(n-2)
    
def fib2(n):
    return 2 * fib(n)

### Final Words on Testing

- Tests are only as good as the cases you can come up with.
- Check that things work, but also check corner cases, weird input, and any other stressors you can come up with. In practice this means that once you encounter an unexpected bug, you add a test for the case that caused the bug.
- Test all code. There are plenty of libraries and tutorials for testing frontend JS code.

## Writing "good code"

Lots of literature exists on this. [The Pragmatic Programmer](http://www.amazon.com/The-Pragmatic-Programmer-Journeyman-Master/dp/020161622X) is a classic, as is [Code Complete](http://www.amazon.com/Code-Complete-Practical-Handbook-Construction/dp/0735619670). 

Look at idiomatic code on GitHub (e.g. a well known library in your language of choice).

You'll notice in both of these books that the author has a well-defined process in mind. The process you choose to follow is less important than 

- Writing tests
- Self-documenting code
- Separation of concerns 
- Not re-inventing the wheel (use libraries)

## Self-documenting code
In the author's mind, this boils down to taking the time to think about and write documentation, and explaining your process as you write out a function.

In [None]:
# Anti-example
def helper(a):
    #haha have fun figuring this out, future dev
    return list(itertools.islice((json.loads(l) for l in requests.get('https://stream.twitter.com/1.1/statuses/sample.json', auth=a, stream=True).iter_lines() if l and 'text' in json.loads(l)),0, 100))

In [None]:
# A good example
TWITTER_STREAM_URL = 'https://stream.twitter.com/1.1/statuses/sample.json'
def get_twitter_data(auth):
    """ Pulls some data from twitter's sample streaming API
    
    Args:
        auth: An OAuth1 object

    Returns: 
        100 of the most recent 'sample' tweets.
    """
    r_stream = requests.get(TWITTER_STREAM_URL, auth=auth, stream=True)
    counter = 0
    tweets = []
    for line in r_stream.iter_lines():
        # filter out keep-alive new lines
        if not line:
            continue
        
        tweet = json.loads(line)
        
        #only want substantive tweets
        if 'text' not in tweet:
            continue
        
        tweets.append(tweet)
        
        counter +=1
        if counter > 100:
            break
    
    return tweets

## Code review
In all probability, there will be at least one other technical person at whatever company you go to - even more likely, your team will have a fairly established code-review process for anything resembling code that goes into production. 
GitHub has a code review process set up that involves using "[Pull Requests](https://help.github.com/articles/using-pull-requests/)."* Take a look at a good one [here](https://github.com/mumrah/kafka-python/pull/195) - more generally, start poking around GitHub and looking at active open source projects.

There are alternatives to this code review style (Perforce is popular as well in the git ecosystem). The basic idea is the same though - you look at diffs of code people want to merge in and critique their style, identify bugs, and point out areas where code could be made more idiomatic/sped up/improved.

\*Note the whole idea of code review presupposes an understanding of version control in collaborative environments - this is why you should git comfy with git!

## Time Management
While good engineering is critically important, part of being a good engineer is understanding that we don't work in a vacuum. When at a job, we're working within the confines of an organization with specific requirements, deadlines, and constraints. 

One of the single most important things you can do as an engineer is respect those deadlines, set reasonable time estimates for yourself, and understand when it's appropriate to cut corners and accumulate [technical debt](http://blog.codinghorror.com/paying-down-your-technical-debt/) in the interest of getting things done.

### Lab: Test writing and `git bisect`

In this lab, we're going to add some tests to the Milestone Project Flask app, intentionally break those tests, and use `git bisect` to find the issue. 

#### 1. Unit Test
Your backend probably connects to Quandl either via `requests` or using the `Quandl` package. Either way, write a unit test for your backend logic that mocks the HTTP connection and ensures the call to get data from Quandl returns something correct.

This may involve factoring out the logic that makes this request into a helper function. Note this is something you generally want to do!

If you used `requests`, see the `Responses` module above. If you used `Quandl`, you can simply use mocking on `Quandl.get`. 

#### 2. Integration Test
Now, write an integration test that checks your backend logic without mocking `Quandl`.

#### 3. Break Your Code
We're going to introduce a breaking change. 

Modify your backend logic s.t. instead of actually calling Quandl upon receiving a ticker symbol to query, your logic always returns recent history for Google (or any hardcoded data that otherwise works). In all likelihood, you didn't test for this. Make sure to commit this change.

In production code, bugs like this frequently occur due to caching issues. Cache bugs are often a bit more subtle than more obvious logical errors, since cached responses will "look" correct, aside from returning stale or possibly incorrect data.

To fix this change, let's do 2 things:

1. *In a new file*, write a test whose spec will catch this sort of error. On your latest commit, the test should fail. The reason we need this to be in a new file is that `git bisect` checks out different versions of your code. If the test is committed to the repo, `git bisect` will make that test disappear since it's not in earlier commits. 
2. Follow [this tutorial](https://robots.thoughtbot.com/git-bisect) for `git bisect` to determine where the breaking change occurred (make sure to read the "Automate It" section before proceeding). For the sake of learning, don't simply say `HEAD` is broken and `HEAD~1` is working - go a few commits back so you can go through the `git bisect` flow. 

### Exit Tickets
1. What is the difference between unit, integration, and end-to-end tests?
1. What is technical debt and how is it used?
1. What is mocking and how is it used?

*Copyright &copy; 2015 The Data Incubator.  All rights reserved.*

*Copyright &copy; 2016 The Data Incubator.  All rights reserved.*