In [35]:
import matplotlib.pyplot as plt
import mplcursors
import pandas as pd
import matplotlib.dates as mdates

## Creating Dataframe 

To begin, we need a way to organize our data. Create a list called "Columns" with the values "Date", "Lessons Completed", and "Summary". 

In [36]:
columns = ["Date", "Lessons Completed", "Summary"]

Next, create a list of lists called "Rows" to fill with your weekly entries. 

Create an example entry for testing purposes. 

* "Date" represents the date of the club meeting in the string format "month/day/year". 

* "Lessons Completed" will be an integer representing the number of Udemy lessons completed during that club meeting. 

* "Summary" will be a short description detailing your personal progress, giving anyone looking at your Progress Journal a deeper understanding of the progress you made on that specific day. This will be the most important column, as it shows potential employers you can think critically about the big-picture of Data Science and communicate ideas in a sensible and concise way. 

In [37]:
rows = [["9/10/19", 2, "First day at the Data Science Club. I completed the introductory course to Python on Udemy, and learned basic data structures."],
        ["9/17/19", 4, "Moved on to work on NumPy and Pandas. Joined a project group."],
        ["9/24/19", 5, "Really did a lot of course work today, mastered Matplotlib and SciKit-Learn. Specifically, I learned alot about Scikit's Random Forest."],
        ["10/8/19", 3, "After skipping last week, met back up with my group members to review progress on our sick project. We allocated responsibilities and set a weekly meeting time for group-check-ins."]]

Combine your two variables "rows" and "columns" in a Pandas Dataframe, and display it in the Jupyter Notebook to ensure it's working properly. 

In [38]:
df = pd.DataFrame(rows, columns=columns)
df["Date"] = pd.to_datetime(df["Date"])
df.head()

Unnamed: 0,Date,Lessons Completed,Summary
0,2019-09-10,2,First day at the Data Science Club. I complete...
1,2019-09-17,4,Moved on to work on NumPy and Pandas. Joined a...
2,2019-09-24,5,"Really did a lot of course work today, mastere..."
3,2019-10-08,3,"After skipping last week, met back up with my ..."


Check what datatypes are present in the dataframe.

In [39]:
df.dtypes

Date                 datetime64[ns]
Lessons Completed             int64
Summary                      object
dtype: object

For ease of analysis and plotting, convert the values in the "Date" column to a datetime object and list the dtypes to ensure the conversion worked.

In [40]:
df["Date"] = pd.to_datetime(df["Date"])
df.dtypes

Date                 datetime64[ns]
Lessons Completed             int64
Summary                      object
dtype: object

## Plotting with Matplotlib

We want to visualize our dataframe in a way that is easy to understand. We'll put the date on the x-axis, and the number of courses completed on the y-axis. If we just do the most basic Matplotlib implementation, however, we get the resulting visualization:


In [48]:
%matplotlib notebook
fig, ax = plt.subplots(figsize = (10, 5))
plt.plot(df["Date"], # Dates on x-axis
             df["Lessons Completed"], # Number of lessons completed on y-axis
            )
plt.show()

<IPython.core.display.Javascript object>

There's quite a few things wrong with the visualization. To fix these weird quirks, we can get more specific in our parameters. 

The first issue that stands our is the ticks on the x and y axis. The y-axis is in float format, but for this type of data we assume we are working with integers (finishing half a lesson won't be represented by .5, either you finish it or not.) 

Create a variable yrange with the list of explicitly stated integer values that should be displayed on the y-axis.

In [42]:
# Creating this type of variable is much better than hard-coding [1,2,3,4,5,6,7]
# since dataframe values will likely change over time, but we can use this same variable.
yrange = range(0, max(df["Lessons Completed"].tolist())+1) # +1 for aesthetics, to leave some space on plot.

To fix the weird alignment of the x-axis, some more obscure matplotlib datetime modules are neccessary. We set the major_locator of the xaxis to every Tuesday, since that's when the Data Science Club meets.

In [43]:
from matplotlib.dates import TU, WeekdayLocator
xrange = WeekdayLocator(byweekday=TU)

In [44]:
def visualize_progress():
    %matplotlib notebook
    plt.style.use("seaborn-darkgrid") # Styling for plot
    fig, ax = plt.subplots(figsize = (10, 5))
    ax.xaxis.set_major_locator(xrange) # Setting ticks every Tuesday from our xrange variable
    xfmt = mdates.DateFormatter('%D') # Setting string format of x label
    ax.xaxis.set_major_formatter(xfmt)
    plt.plot(df["Date"], # Dates on x-axis
             df["Lessons Completed"], # Number of lessons completed on y-axis
             linestyle='None', # No lines connecting points
             marker="o", # Setting marker shape
             markersize = 5 # Size of marker
            )
    
    plt.yticks(yrange) # Explicitly setting yticks from our variable yrange.

    ##### Setting Titles #####
    plt.ylabel("Lessons Completed", labelpad = 15, size = 12)
    plt.xlabel("Date of Meeting", labelpad = 15, size = 12)
    plt.title("Parker's Data Science Club Activity", pad = 15, size = 16)
    
             
    ##### Connecting mplcursors for Interactive Element #####      
    cursor = mplcursors.cursor(fig)
    cursor.connect('add', lambda sel: sel.annotation.set_text(df.Summary.iloc[sel.target.index]))


## Formatting Summary Annotations

If your "Summary" values are thorough enough, you are likely experiencing the issue of text being cutoff by the size of the plot. We should fix that to create a neater user experience. 

1. Write a function below that adds in a newline character ("\n") every 5 words so that the text doesn't fall off the figure. 
2. Then, apply it to the "Summary" column of your DataFrame.

In [45]:
def Add_Newlines(text):
    text = text.split()
    for ix, word in enumerate(text):
        if ix != 0 and ix % 5 == 0:
            text.insert(ix, "\n")
    return " ".join(text)

In [46]:
df["Summary"] = df["Summary"].apply(Add_Newlines)

In [47]:
visualize_progress()

<IPython.core.display.Javascript object>