# Instagram Likes Report
### By: Logan Jackson
#### Sept. 20, 2022

In this report I will be looking at my professor's Instagram "likes" data in order to determine what person and/or topic is most interesting to him. Specifically, I hypothesize that a higher number of likes will indicate a higher interest in that person/thing. 

From what I understand, this information is collected automatically by Instagram's systems and is mostly accessed by whatever algorithm they use. The typical application for this data is deciding which advertisements to push to a user since it would increase the chance of the advertisement being something that the user would want/need. It also helps when deciding which notifications to push to the user in order to get them to check the app again.

This data can be fairly reliable since the user is directly telling Instagram the things that they like, as well as what they didn't like by simply scrolling past, which is tracked in the "seen_content.json" file. However, this data could also be unreliable at times since the user might not 'like' things every time they actually enjoy seeing them. There are could be reasons for why a user might 'like' something when they actually do not, such as a misclick.

In [1]:
import pandas as pd
import json

The "likes.json" file is most likely the ideal data to use for my analysis, however, "connections.json" would also be helpful along with this data since it shows which accounts he follows.

In [2]:
with open(r"C:\Users\ltjac\EMAT-Fall-22\technogecko_20200714_toshare\likes.json") as j:
    ig_dat_likes = json.load(j)

In [3]:
ig_dat_likes

{'media_likes': [['2020-07-11T04:39:28+00:00', 'ball_doesnt_lie'],
  ['2020-07-11T04:39:05+00:00', 'ball_doesnt_lie'],
  ['2020-07-05T17:25:44+00:00', 'ali_saurusrex'],
  ['2020-07-03T03:40:02+00:00', 'cacandassociates'],
  ['2020-06-25T17:41:50+00:00', 'cacandassociates'],
  ['2020-06-22T23:01:55+00:00', 'reams_esq'],
  ['2020-06-08T15:05:46+00:00', 'emmyr0o'],
  ['2020-06-07T12:46:29+00:00', 'ali_saurusrex'],
  ['2020-06-02T01:03:28+00:00', 'colin_storm'],
  ['2020-05-25T16:38:14+00:00', 'ali_saurusrex'],
  ['2020-05-19T23:38:40+00:00', 'colin_storm'],
  ['2020-05-18T13:42:30+00:00', 'emmyr0o'],
  ['2020-05-14T13:51:03+00:00', 'emmyr0o'],
  ['2020-05-12T21:31:12+00:00', 'cacandassociates'],
  ['2020-05-11T05:07:31+00:00', 'inalull'],
  ['2020-05-07T18:07:52+00:00', 'reams_esq'],
  ['2020-05-06T00:33:58+00:00', 'inalull'],
  ['2020-04-30T19:54:48+00:00', 'emmyr0o'],
  ['2020-04-28T12:58:14+00:00', 'inalull'],
  ['2020-04-28T03:28:35+00:00', 'cacandassociates'],
  ['2020-04-26T06:41:17

This data is still messy, so I'll arrange it into a DataFrame.

In [4]:
ig_likes_df = pd.DataFrame(ig_dat_likes['media_likes'])

In [5]:
ig_likes_df

Unnamed: 0,0,1
0,2020-07-11T04:39:28+00:00,ball_doesnt_lie
1,2020-07-11T04:39:05+00:00,ball_doesnt_lie
2,2020-07-05T17:25:44+00:00,ali_saurusrex
3,2020-07-03T03:40:02+00:00,cacandassociates
4,2020-06-25T17:41:50+00:00,cacandassociates
...,...,...
330,2013-02-05T02:58:46+00:00,natgeo
331,2013-02-05T02:22:24+00:00,aroseroar16
332,2013-02-05T01:29:31+00:00,aroseroar16
333,2013-02-04T17:42:04+00:00,ali_saurusrex


The DataFrame helps display the data, but this does not help discern who/what my professor likes the most. Counting the likes per user and sorting them from greatest to least should be enough to draw conclusions from.  

In [6]:
ig_likes_df.groupby(1).count().nlargest(58,0)

Unnamed: 0_level_0,0
1,Unnamed: 1_level_1
ali_saurusrex,68
aroseroar16,29
orangekoala2,27
a_matt_silva,18
reams_esq,16
emmyr0o,16
cacandassociates,15
danneabreanne,13
colin_storm,11
inalull,10


Figuring out how to get the data into a form that was easily usable was a bit of a challenge since the file doesn't sort anything other than each instance of a 'like,' meaning that I could not simply take the data from one column and instantly sort it out. Once I figured it out, though, the analysis became pretty simple.

From the data, it is clear that one account gets much more attention than any other, that account being ali_saurusrex with 68 liked posts. The next closest are aroseroar16 and orangekoala2 with 29 and 27 liked posts respectively. 

I would conclude from this that my professor has the most interest in seeing what ali_saurusrex posts followed by aroseroar16 and orangekoala2. He could have anywhere from a moderate to high interest in the accounts with 9-18 liked posts, and most likely a lower interest in any account with 5 or less liked posts. With this data, I would hypothesize that a notification about one of the top 3 people on this list would be much more likely to get my professor to check Instagram, whereas someone below the top 15 would be much less likely to successfully grab his attention.

There are some clear limitations on this data that might effect the efficacy future tests/analysis, however. I only have access names and numbers with this data. I have no knowledge about the people behind these accounts or what the context of each individual 'like' was. I have no idea what was even included that caused him to 'like' the post. I am simply going off the assumption that an account that has been liked more than others would be more likely to produce desired results if used as a basis for a test. 

Ideally, the best next step would be to test this theory by altering the notifications received or analyzing any data that might already be collected about the successful click rates of notifications. To my knowledge, though, that data is not included in what Instagram provides (but it is most likely still being tracked). If I wanted to analyze more data to strengthen my hypothesis my next step would probably be to look at the aforementioned "connections.json" file or anything else that would show specific interactions with certain users.

If I wanted to take an alternate approach entirely, though, I could use Instagram itself to find specific liked posts and try to find any patterns that emerge now knowing the context of each 'like.' Taking a look at more qualitative data directly from the app would help make a more informed decision on what would be most likely to draw the interest of the user. 