# Likes Report
### Joseph Rush

#### Hypothesis
Since Dr. Silva is a busy professor who maintains multiple social media accounts and has expressed that his Instagram use is intermittent, the timestamps will show that this is true, with no more than a few interactions per day and month.

The main application of this hypothesis is as a case study in social media. People, intentionally and unintentionally, often misreport their social media usage. This is an intentional part of the design by social media companies, since they want people not to notice how long they're using the platforms. So, studies in whether an individual or group is accurately recalling their usage provide a useful diagnostic tool in assessing the psychology of social media.

The data available is useful for this use case because liking posts is a fundamental and low-effort Instagram interaction, having a low mental cost and high ease of use and emphasis in the UI. 

## Data Collection

This data was collected by Instagram and made available for download. It is collected primarily to analyze user activity on Instagram and more accurately tune recommendations and advertising to individual users. It reliably reports interactions with individual accounts, but for the purposes of the hypothesis it is missing certain crucial information. Notably, comments and shares, the other two primary interactions, are missing from the data. If Dr. Silva is the sort of user who often opposes liking posts, or only likes posts very rarely, this data will not provide a complete picture of his Instagram interactions.

This data was also provided by Dr. Silva, so it it possible that he altered it in some way to excise interactions with particular accounts or by some other criteria, but this seems doubtful.

First, we import data and place it into a pandas dataframe.

In [33]:
import pandas as pd
import json
with open(r"E:\EMAT32110-DataInEmergingMediaAndTechnology\230911-technogecko_20200714_toshare\likes.json") as likes_json:
    likes_data = json.load(likes_json)
    
media_likes = pd.DataFrame(likes_data['media_likes'])
media_likes

Unnamed: 0,0,1
0,2020-07-11T04:39:28+00:00,ball_doesnt_lie
1,2020-07-11T04:39:05+00:00,ball_doesnt_lie
2,2020-07-05T17:25:44+00:00,ali_saurusrex
3,2020-07-03T03:40:02+00:00,cacandassociates
4,2020-06-25T17:41:50+00:00,cacandassociates
...,...,...
330,2013-02-05T02:58:46+00:00,natgeo
331,2013-02-05T02:22:24+00:00,aroseroar16
332,2013-02-05T01:29:31+00:00,aroseroar16
333,2013-02-04T17:42:04+00:00,ali_saurusrex


Next, we pull the year, day, and month of each interaction and add those as a column to easily group our data by day, then count the number of interactions per day.

In [31]:
timestamps_list = [x[0:10] for x in media_likes[0]]

In [28]:
media_likes['timestamps'] = timestamps_list
media_likes.groupby('timestamps').count().sort_values([0], ascending=False)

Unnamed: 0_level_0,0,1
timestamps,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-06-16,5,5
2016-03-27,4,4
2016-08-16,4,4
2014-06-07,4,4
2017-03-22,4,4
...,...,...
2017-01-09,1,1
2017-01-16,1,1
2017-01-19,1,1
2017-02-05,1,1


The data spans from 2/4/13 to 7/11/2020, a period of 2,715 days. In that time, Dr. Silva interacted with the like button on Instagram on only 271 days, or about one Instagram session every 10 days. On any given day, he had between 1 and 5 interactions. This agrees with his self-reporting as an intermittent Instagram user, so in this case our test subject has accurately reported his usage. It would be more ideal to have a record of all interactions, such as comments and shares, or even the data Instagram undoubtedly has about his total screentime on the app, but likes serve as a suitable proxy for this sort of activity survey.