Thomas Brown - 12/01/2020

# Technical Assessment for Data Analyst Position

# Importing Libraries:

In [7]:
import pandas as pd
import sqlite3

# Importing Data:

In [5]:
df = pd.read_csv('issues.csv') # Importing the data
display(df.head()) # and checking out the first few rows
display(df.info()) # No null values - likely fairly clean data
display(df.shape) # 6,907 rows and 17 columns

Unnamed: 0,created_at,number,state,label_bug,label_more_info_needed,label_feature_request,label_help_wanted,num_comments,num_commenters,reaction_eyes,reaction_rocket,reaction_thinking_face,reaction_thumbs_up,reaction_heart,reaction_tada,reaction_thumbs_down,reaction_smile
0,2016-05-12,1,closed,False,False,False,False,0,0,0,0,0,0,0,0,0,0
1,2016-05-12,2,closed,False,False,False,False,1,1,0,0,0,0,0,0,0,0
2,2016-05-12,3,closed,False,False,False,False,1,1,0,0,0,0,0,0,0,0
3,2016-05-12,4,closed,False,False,False,False,2,2,0,0,0,0,0,0,0,0
4,2016-05-12,5,closed,False,False,False,False,1,1,0,0,0,0,0,0,0,0


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6907 entries, 0 to 6906
Data columns (total 17 columns):
created_at                6907 non-null object
number                    6907 non-null int64
state                     6907 non-null object
label_bug                 6907 non-null bool
label_more_info_needed    6907 non-null bool
label_feature_request     6907 non-null bool
label_help_wanted         6907 non-null bool
num_comments              6907 non-null int64
num_commenters            6907 non-null int64
reaction_eyes             6907 non-null int64
reaction_rocket           6907 non-null int64
reaction_thinking_face    6907 non-null int64
reaction_thumbs_up        6907 non-null int64
reaction_heart            6907 non-null int64
reaction_tada             6907 non-null int64
reaction_thumbs_down      6907 non-null int64
reaction_smile            6907 non-null int64
dtypes: bool(4), int64(11), object(2)
memory usage: 728.6+ KB


None

(6907, 17)

## Column Descriptions:

- created_at: the day the issue was created
- number: the issue number
- state: whether the issue is open or closed
- label_bug: true if the "bug" label was applied to the issue
- label_more_info_needed: true if the "more info needed" label was applied to the issue
- label_feature_request: true if the "feature request" label was applied to the issue
- label_help_wanted: true if the "help wanted" label was applied to the issue, indicating the issue is a good one for an outside contributor to work on
- num_comments: the number of comments on the issue
- num_commenters: the number of people commenting on the issue
- reaction_eyes: the number of people who have added the eyes emoji reaction to the issue
- reaction_rocket: the number of people who have added the rocket emoji reaction to the issue
- reaction_thinking_face: the number of people who have added the thinking face emoji reaction to the issue
- reaction_thumbs_up: the number of people who have added the thumbs up emoji reaction to the issue
- reaction_heart: the number of people who have added the heart emoji reaction to the issue
- reaction_tada: the number of people who have added the tada/party emoji reaction to the issue
- reaction_thumbs_down: the number of people who have added the thumbs down emoji reaction to the issue
- reaction_smile: the number of people who have added the smile emoji reaction to the issue

# Initializing SQLite3 DB:

In [25]:
conn = sqlite3.connect("issues_db.sqlite")
c = conn.cursor()
df.to_sql("issues", conn, index = False)

# Questions:

## How many issues are currently open in the avocado-toast repository?

In [33]:
c.execute("""SELECT COUNT (state) as open_issues
             FROM issues
             WHERE state = 'open';""")
df1 = pd.DataFrame(c.fetchall())
df1.columns = [i[0] for i in c.description]
df1.head()

Unnamed: 0,open_issues
0,736


## Which month had the most issues created in the avocado-toast repo?

In [None]:
c.execute("""SELECT *
             FROM issues;""")
df2 = pd.DataFrame(c.fetchall())
df2.columns = [i[0] for i in c.description]
df2.head()
# Use count month 
# Order Desc
# Limit 1

## Which month had the most bugs filed?

In [None]:
c.execute("""SELECT *
             FROM issues;""")
df3 = pd.DataFrame(c.fetchall())
df3.columns = [i[0] for i in c.description]
df3.head()
# Use label_bug
# Pull out month
# Same as last question

## What is the most used emoji reaction in the avocado-toast repo?

In [None]:
c.execute("""SELECT *
             FROM issues;""")
df4 = pd.DataFrame(c.fetchall())
df4.columns = [i[0] for i in c.description]
df4.head()
# Count of all emoji columns

## How many issues with more than 5 comments had each commenter leave exactly one comment?

In [None]:
c.execute("""SELECT *
             FROM issues;""")
df5 = pd.DataFrame(c.fetchall())
df5.columns = [i[0] for i in c.description]
df5.head()
# Count where comments > 5 and commenter = 1. . . 

## Which issue is the most popular in the avocado-toast repo?

In [None]:
c.execute("""SELECT *
             FROM issues;""")
df6 = pd.DataFrame(c.fetchall())
df6.columns = [i[0] for i in c.description]
df6.head()
# Looking for most emojis perhaps. . . 
# of the 17 columns, what would indicate popularity?

## Which issue is the least popular in the avocado-toast repo?

In [None]:
c.execute("""SELECT *
             FROM issues;""")
df7 = pd.DataFrame(c.fetchall())
df7.columns = [i[0] for i in c.description]
df7.head()
# Which one has the lease interaction? 
# Perhaps the most thumbs down

## Which issue is the most controversial in the avocado-toast repo?

In [None]:
c.execute("""SELECT *
             FROM issues;""")
df8 = pd.DataFrame(c.fetchall())
df8.columns = [i[0] for i in c.description]
df8.head()
# Def looking for most thumbs down, but also has high positive count rate. . . 

## If an AvocadoCorp employee working on avocado-toast asked you which issue they should work on next, what would you recommend, and why?

In [None]:
c.execute("""SELECT *
             FROM issues;""")
df9 = pd.DataFrame(c.fetchall())
df9.columns = [i[0] for i in c.description]
df9.head()
# Most recent unsolved is first answer
# Next answer would be the one with the most comments and interactions. . . 