# "your_off-facebook_activity" Report - Instructions
### EMAT 22110 - Data in Emerging Media and Technology
### Author: David E. Silva
### Created: 7/21/20
### Last Updated: 1/18/21
<img src="https://media.giphy.com/media/3ohc14lCEdXHSpnnSU/giphy.gif">

#### Purpose
**The analysis step of the loop covers a wide variety of techniques to bring sense to data. This is an investigative and reflective process, not a canned sequence of actions. The most common error is matching the wrong analysis to the data in hand. In this assignment you will showcase your ability to conduct a variety of analyses and appropriately apply the analysis tools covered in class. You will submit a final report documenting your analysis so it can be reproduced by others and displaying evidence to support the conclusions you draw from the data.**

Before attempting this assignment, you will need to complete the Reporting Assignment, downoad your personal Facebook data, and review the content from "Focus on Analysis."

To complete this assignment follow the in-class example to:
1. Open and <a href="https://docs.python.org/3/library/json.html">load</a> the <a href="https://www.json.org/json-en.html">JSON</a> file titled "your_off-facebook_activity.json"
2. Convert the data in "your_off-facebook_activity.json" to a DataFrame object using Pandas
3. Complete steps 1 through 4 from “<a href="https://psyarxiv.com/r8g7c/">The Eight Steps of Data Analysis</a>” by Dustin Fife.
4. Select the conduct the appropriate statistical summaries and analyses (for example from scipy.stats: <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.describe.html#scipy.stats.describe"> describe()</a>, <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html#scipy.stats.chisquare">chisquare()</a>, <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html#scipy.stats.ttest_ind">ttest_ind()</a> or <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html#scipy.stats.f_oneway">f_oneway()</a>) for the data type.
5. Choose an appropriate visualization to aid in interpreting the chosen summaries and analyses.

Then write a complete report of the data that:

6. Provides and overview that clearly states the driving question and links the question to the data approach
7. Describe the raw data structure and data types used in the analysis
8. Documents the wrangling and analysis of the data
9. Includes a clear and appropriate visualization
10. Draws a data-driven conclusion that addresses the original question
11. Reflects on limitations, alternative approaches, and next steps

## Example
### 1. Overview
Driving quesiton: How is your Facebook information connected to non-Facebook apps? How frequently do these apps request your Facebook data? Which apps request Facebook data most often?

In [2]:
import json
from datetime import datetime
import matplotlib.pyplot as plt
import numpy as np
import time
import pandas as pd

ModuleNotFoundError: No module named 'matplotlib'

In [3]:
with open("your_off-facebook_activity.json") as f:
    act = json.load(f)

act

{'off_facebook_activity': [{'name': 'U.S. Bank - Inspired by customers',
   'events': [{'id': 1591992844216198,
     'type': 'ACTIVATE_APP',
     'timestamp': 1594274368},
    {'id': 1591992844216198, 'type': 'ACTIVATE_APP', 'timestamp': 1594239280},
    {'id': 1591992844216198, 'type': 'ACTIVATE_APP', 'timestamp': 1594156238},
    {'id': 1591992844216198, 'type': 'ACTIVATE_APP', 'timestamp': 1594066670},
    {'id': 1591992844216198, 'type': 'ACTIVATE_APP', 'timestamp': 1593919348},
    {'id': 1591992844216198, 'type': 'ACTIVATE_APP', 'timestamp': 1593652182},
    {'id': 1591992844216198, 'type': 'ACTIVATE_APP', 'timestamp': 1593651360},
    {'id': 1591992844216198, 'type': 'ACTIVATE_APP', 'timestamp': 1593273365},
    {'id': 1591992844216198, 'type': 'ACTIVATE_APP', 'timestamp': 1593127728},
    {'id': 1591992844216198, 'type': 'ACTIVATE_APP', 'timestamp': 1593100373},
    {'id': 1591992844216198, 'type': 'ACTIVATE_APP', 'timestamp': 1593095964},
    {'id': 1591992844216198, 'type': '

### 2. The Data
Describe the raw data structure and data types. Where did this data come from? Do you trust this data? Are there any concerns using this data?

### 3. Data Preparation
Describe each step taken to wrangle, summarize, or modify your data.
If you have removed data due to privacy concerns or any other reason, you do not need to describe what was removed.

In [4]:
#Activity frequency by app
apps = []
events = []
actapp = []
custom = []
search = []
viewc = []
ad = []
view = []
for i in act['off_facebook_activity']:
    apps.append(i['name'])
    events.append((len(i['events'])))
    actapp.append(len([x for x in i['events'] if x['type'] == 'ACTIVATE_APP']))
    custom.append(len([x for x in i['events'] if x['type'] == 'CUSTOM']))
    search.append(len([x for x in i['events'] if x['type'] == 'SEARCH']))
    viewc.append(len([x for x in i['events'] if x['type'] == 'VIEW_CONTENT']))
    ad.append(len([x for x in i['events'] if x['type'] == 'AD_REQUEST']))
    view.append(len([x for x in i['events'] if x['type'] == 'PAGE_VIEW']))

In [5]:
appbyevent = pd.DataFrame(columns = ('App', 'Event Count', 'Activate App', 'Custom', 'Search', 'View Content', 'Ad Request', 'Page Views'))
appbyevent['App'] = apps
appbyevent['Event Count'] = events
appbyevent['Activate App'] = actapp
appbyevent['Custom'] = custom
appbyevent['Search'] = search
appbyevent['View Content'] = viewc
appbyevent['Ad Request'] = ad
appbyevent['Page Views'] = view

NameError: name 'pd' is not defined

### 4. Data Visualization
You must include at least one appropriate visualization of the data. Your visualization cannot be an exact replication of the example below.

In [6]:


appbyevent = appbyevent.sort_values(by = ['Event Count'], ascending = False)
X = np.arange(30)
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
ax.bar(X + 0.00, appbyevent['Event Count'][0:30], color = 'b', alpha = .1, width = .90)
ax.bar(X + 0.00, appbyevent['Activate App'][0:30], color='b', width = .15)
ax.bar(X + 0.15, appbyevent['Custom'][0:30], color='g', width = .15)
ax.bar(X + 0.30, appbyevent['Page Views'][0:30], color='r', width = .15)
ax.bar(X + 0.45, appbyevent['View Content'][0:30], color='c', width = .15)
ax.bar(X+.60, appbyevent['Ad Request'][0:30], color = 'y', width = .15)
plt.xticks(ticks=X+.2, labels = list(appbyevent['App'][0:30]), rotation=90)
ax.legend(labels=['Total Events' ,'Activate App', 'Custom', 'Page Views', 'View Content', 'Ad Request'])
plt.show()

NameError: name 'appbyevent' is not defined

In [7]:
X = np.arange(30)
plt.bar(appbyevent['App'][0:29], appbyevent['Activate App'][0:29], color='b', width = .25)
plt.bar(appbyevent['App'][0:29], appbyevent['Custom'][0:29], color='c', width = .25)
plt.bar(appbyevent['App'][0:29], appbyevent['Search'][0:29], color='g', width = .25)
plt.bar(appbyevent['App'][0:29], appbyevent['View Content'][0:29], color='y', width = .25)
plt.xticks(rotation=90)
plt.show

NameError: name 'np' is not defined

You may want to explore frequecies over time. The code below may give you a start.

In [8]:
#Timeline of activity
count = 0
tiktoktimes = []
while count < len(act['off_facebook_activity'][3]['events']):
    tiktoktimes.append(act['off_facebook_activity'][3]['events'][count]['timestamp'])
    count = count + 1

#tiktokutc = []
#for i in tiktoktimes:
#    tiktokutc.append(datetime.utcfromtimestamp(i))
    
#startdate = datetime(2020,1,29)
#enddate = datetime(2020, 8, 1)
#daylength = (enddate -  startdate).days

#tik = pd.DataFrame()
#tik['times'] = tiktokutc    
#tik['day'] = tik['times'].apply(lambda x: x.isocalendar()[1])


#week_groups = tik.groupby([tik['date_year'],tik['date_week']]
#                          )['value'].count()

#week_groups.plot(kind='bar',figsize=(10,5),legend=None)

### 5. Conclusions
Draw at least two data-driven conclusions from the above analysis and visualization. What did you learn? Were there answers to your driving questions?

### 6. Limitations, Alternative Approaches, & Next Steps
Note any limitatations to to this analysis or other ways to looking at the same data. Are there new questions raised from this analysis? What could be done next to continue adding new understanding to this topic?