# Assignment 7: A Data Game

This assignment practices Python coding.  You will be using some pandas and data visualization tools as well as practicing many of the tools and concepts learned earlier in the semester.

The game is guessing game somewhat reminiscent of Guppies but now competitive between two players.  The game shows the players a scatterplot of two columns selected randomly from a file, and asks each to guess what the correlation of the two columns is.  The player with the closest guess in each round gets a point.  The player with the most points at the end of the game wins.  For an example of what the game should look like, see [`A7_sample_output.ipynb`](A7_sample_output.ipynb).

To organize your code, much of the game is broken into separate functions.  Each section below contains one function that you should develop.  Each function is specified in a **documentation comment** at the start of the function (often called a "docstring") describing what the function should do, what arguments it takes, and what it returns (if anything).  Follow the given specification exactly, and write your code for each function after the documentation comment.

After each function, there is a place to test the function.  Write some code matching the described test, and use that to test your function as you develop it.  The tests will help you know when each function is working correctly.  Feel free to write and use additional tests as well for further confirmation that your code is correct and/or for help finding bugs.

It is always a good idea to test any code you write with a variety of inputs.  In this directory, we've provided two .csv files to test against; feel free to add others to try as well.

### Saving Drafts and Submitting

Before you do anything else, execute the cell below. This will prompt you to log in and then save your work via an online submission system.

You can re-run the cell and to submit your work as many times as you want before the deadline. We will only grade your final submission.

Any time you want to submit your work, select "Save Notebook" in the File menu (or press the Save icon, or press <kbd>Ctrl+S</kbd>) and then execute the cell again.  The result will contain a link that you can use to check that your assignment has been submitted successfully.

*[Executing this may print some errors saying "Javascript Error: IPython is not defined"; those may safely be ignored.]*

In [None]:
# This cell is just for submitting your work.
# Each time you execute it, a copy of this notebook will be uploaded to the submission system.
from client.api.notebook import Notebook
ok = Notebook('A7.ok')
import os
if not os.path.exists(os.path.join(os.environ.get("HOME"), ".config/ok/auth_refresh")):
    ok.auth(force=True)
else:
    ok.auth(inline=True)
_ = ok.submit()

### Authorship and Resources Used
* Include your name here
* If you received any assistance from anyone else, state who you consulted and specifically how they helped
* If you used any other resources, state what they were and specifically how they helped, include links to the resources. [Markdown links use this formatting.](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet#links)

***

## Imports

The following imports will be needed in your code below.  Do not remove any.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import getpass
import random

***

## Function: Find numeric columns in a dataframe

Tips:

 * Information about columns in a dataframe can be accessed using `dataframe.dtypes`, which gives you a Pandas Series of the column's datatypes.
 * To access the values in a Series, you can use the `.items()` method on the Series object.  The `.items()` method will return a sequence of *tuples*, where each tuple contains a column label at index 0 and the matching datatype at index 1.
 * You can use a for loop to iterate over the sequence, and use indexing to access the label and the datatype within each tuple.
 * The datatype for any numeric column will be either `'int64'` or `'float64'`.
 
**We're giving you this function.  You still need to write a test for it, below.**

In [None]:
def find_numeric_cols(df):
    """Returns a list of column labels for the numeric columns in a dataframe.
    
    Numeric columns have a datatype of either 'int64' or 'float64'.
    
    Args:
        df: A pandas dataframe.
    
    Returns:
        A list of strings, each one the label of a numeric column in df.
    """
    numeric_cols = []
    for column in df.dtypes.items():
        if column[1] == 'int64' or column[1] == 'float64':
            numeric_cols.append(column[0])
    return numeric_cols

#### Test

Create a dataframe from a given file; verify that `find_numeric_cols()` returns a list that contains all of its numeric columns and none of its non-numeric columns.

In [None]:
# Test find_numeric_cols()

***

## Function: Display one "problem" (scatterplot)

In [None]:
def show_pair(df, col1, col2):
    """Shows a scatterplot of the given columns from the given dataframe.
    
    Args:
        df: A pandas dataframe.
        col1 (str): Label of the column for the scatterplot's x-axis.
        col2 (str): Label of the column for the scatterplot's y-axis.
    """

#### Test

Create a dataframe from a given file, call `show_pair()` with the dataframe and two known numeric columns, and verify that a scatterplot of those columns is shown.

In [None]:
# Test show_pair()

***

## Function: Setup and display one problem (one round) of the game

Tips:

 * To choose a column randomly, you can use the [`random.choice()` [docs]](https://docs.python.org/3/library/random.html#random.choice) function in the `random` module (already imported above).
 * To choose two randomly at once, [`random.choices()` [docs]](https://docs.python.org/3/library/random.html#random.choices) will work.  You will have to specify a value for the `k` argument.  The function returns a list of `k` elements, and you can use indexing to access the individual choices.
 * Using `random.choice()` twice might be easier -- either works.

In [None]:
def next_problem(df):
    """Creates and displays a new problem for the game.
    
    Given a dataframe, this chooses two numeric columns randomly from
    the dataframe, shows a scatterplot of those columns using show_pair(),
    and returns the correlation of those two columns.
    
    Args:
        df: A pandas dataframe.
        
    Returns (float):
        The correlation value for the two selected columns.
    """

#### Test

Create a dataframe from a given file, and call `new_problem()` a few times, verifying that it shows a scatterplot of a random selection of 2 numeric columns from the dataframe each time it is called **and** that it returns the correlation between those columns each time.

In [None]:
# Test next_problem()

***

## Function: Get guesses from players and find closest to correct value

Tips:

 * To ask the user for input without showing what they type on the screen, use [`getpass.getpass()`](https://docs.python.org/3/library/getpass.html#getpass.getpass) from the `getpass` module (already imported above).  It works almost identically to the `input()` function, but it replaces whatever characters the user types with dots.
 * Make sure to compare the *absolute values* of the differences between each guess and the correct value using `abs()`.  Guesses can be close to the correct value either above or below it, and it's only the magnitude of the difference that matters here, not the direction (above or below).

In [None]:
def get_check_guesses(player1, player2, correct):
    """Ask for and check guesses from two players.
    
    Each player is asked for a guess using getpass.getpass() to hide their input from the
    other player.  Then the actual value is displayed.  The function prints a message
    stating which player's guess was closest to the actual value, or it prints a message
    saying both were an equal distance from the value if that is the case.  The name
    of the player with the closest guess is returned, or None is returned if the guesses
    were equal distances from the correct value.
    
    Args:
        player1 (string): Name of the first player.
        player2 (string): Name of the second player.
        correct (float): Correct value the players are attempting to guess.
        
    Returns:
        player1, player2, or None, depending on whether the first player's guess
        was closest, the second player's guess was closest, or both guesses were
        equally close, respectively.
    """

#### Test

Call `get_check_guesses()` a few times with two player name strings and a particular correct value.  Verify that guesses are recorded without displaying them, that each player can 'win' if they are the closest one to the correct value, that the function returns the correct name (or None) in all cases, and that ties are handled correctly.

In [None]:
# Test get_check_guesses()

***

## Function: Add additional columns (adds variety)

Because this just adds variety and is not critical to the functioning of the game, it's probably best to implement and test everything else first.

In [None]:
def add_negations(df):
    """Creates a new dataframe containing columns from a given dataframe and their negations.
    
    Args:
        df: a pandas dataframe.  The function can assume that this only contains numeric columns.
    
    Returns:
        A new dataframe containing all columns from df plus negated (times -1) versions of those columns.
    """

#### Test

Setup: Create a dataframe from a given file, use `find_numeric_cols()` to find the numeric columns, and then make a new dataframe containing just those numeric columns.

Test: Give that dataframe of numeric columns to `add_negations()`, verifying that it returns a new dataframe containing additional negated columns, one for each column in the given dataframe.  Also verify that the original dataframe has not been changed by calling this function; that is, make sure it does return a copy and adds the new columns only to the copy.

In [None]:
# Test add_negations()

***

## Main program

Write the code for the game here, using all of the functions you have defined above.  See `A7_sample_output.ipynb` for an example of how the game should work and what it should output.  A few things that may not be evident from the sample output:

 * The program should make a copy of just the numeric columns from the selected data and use that for the game.
 * For additional variety in the generated scatterplots and correlation values, the program should add negated versions of all numeric columns to the dataframe before starting.
 * If the two users tie, the program should output `"Tie!  No winner this time."`
 * The game should repeat for at least 4 rounds.  (You can make it longer if you prefer.)
 * The dataset `fandango_score_comparison.csv` has been provided here, but you can upload other datasets to try them out in the game as well.
 * To pause with a prompt of "Press enter to continue...", you can use a call to `input()` with that prompt string.

In [None]:
print("=======================")
print("Welcome to DataGuppies!")
print("=======================")
print()

# Write your program here...