# Likes Report
## Ash S. Copeland (Holloway)
### 9/12/2024

The data in this report is downloaded from Instagram, sourced via standard collection practices on my own personal Instagram account. This data was created in order to track all user interaction accross the platform for both advertising/marketing and general activity archives. This data can be used to show relevant advertisements to the user and to solve future issues utilizing the archived data as a reference for activity by Instagram's support and development team.

This data can be considered both reliable and unreliable. It is reliable because it stores all activity data without bias or exclusion of data. It is unreliable due to its inability to compound all user data in one account. For instance, I often use multiple different accounts for my different interests, and therefore the data is splintered and no longer complete as it would be for a user with only a single account.

Before seeing the finalized and tidy data, it is important to first view the plain data. We acomplish this by simply opening the json file in which the data is stored. In order to accomplish this task, we must first begin by importing the necessary code in order to both interpret the json file and create a useful dataframe out of it. We begin with importing **JSON** and **PANDAS** below:

In [32]:
import json
import pandas as pd

Now that these systems are imported correctly, we can begin opening the plain data file into the notebook!

In [33]:
with open('liked_posts.json') as L:
    likes_raw = json.load(L)

And done! the data should now be added to the notebook and we can begin working with it. We can confirm this by viewing the raw data below:

*[Data Display Removed for Privacy Reasons]*

Next we can use the ".keys()" function in order to see what further data may exist, and then utilize that to create a dataframe.

In [34]:
likes_raw.keys()

dict_keys(['likes_media_likes'])

In [35]:
likes_df_raw = pd.DataFrame(likes_raw['likes_media_likes'])
likes_df_raw

Unnamed: 0,title,string_list_data
0,bazarnov3d,[{'href': 'https://www.instagram.com/reel/C-V2...
1,sovjetanimals,[{'href': 'https://www.instagram.com/reel/C-lU...
2,alt.overload,[{'href': 'https://www.instagram.com/p/C_Loucx...
3,psychological,[{'href': 'https://www.instagram.com/reel/C_Js...
4,the_frog_mage,[{'href': 'https://www.instagram.com/p/C-OQAT2...
...,...,...
1868,frog_paradise,[{'href': 'https://www.instagram.com/reel/CwgG...
1869,westbrouck,[{'href': 'https://www.instagram.com/reel/Cwla...
1870,meowstershots,[{'href': 'https://www.instagram.com/reel/Cu6B...
1871,giuseppes.crabapples,[{'href': 'https://www.instagram.com/p/CwYWA63...


And here it is! our raw dataframe. It is not quite tidy, and is missing some data that may be helpful in understanding just what we are looking at. The first addition we will make is a timestamp in order to know when each of these likes occured.

In [36]:
likes_df_raw['string_list_data'].iloc[2]

[{'href': 'https://www.instagram.com/p/C_LoucxpGQc/',
  'value': 'ð\x9f\x91\x8d',
  'timestamp': 1724844571}]

Now that we've located where the timestamp data is, we need to extract it into a usable form to then be seperated and added to the dataframe. We begin that with the following:

In [37]:
string_list_data = [x['string_list_data'][0] for x in likes_raw['likes_media_likes']]

*[Data Display Removed for Privacy Reasons]*

In [38]:
likes_df_timed = pd.DataFrame(string_list_data)
likes_df_timed

Unnamed: 0,href,value,timestamp
0,https://www.instagram.com/reel/C-V2Q8bK5kQ/,ð,1724844609
1,https://www.instagram.com/reel/C-lUh0Co2re/,ð,1724844599
2,https://www.instagram.com/p/C_LoucxpGQc/,ð,1724844571
3,https://www.instagram.com/reel/C_JsxyASJGx/,ð,1724769088
4,https://www.instagram.com/p/C-OQAT2p_Mq/,ð,1724679492
...,...,...,...
1868,https://www.instagram.com/reel/CwgGZ3AsXP-/,ð,1693481378
1869,https://www.instagram.com/reel/CwlaW30yE39/,ð,1693432936
1870,https://www.instagram.com/reel/Cu6BmUZI3i3/,ð,1693432900
1871,https://www.instagram.com/p/CwYWA63pEZD/,ð,1693406733


We can now see the timestamp data that we can work with. Next we need to extract it from other data, and compile it back into a summary dataframe with more tidy data.

In [39]:
timestamp = likes_df_timed.groupby('timestamp').count().sort_values('timestamp',ascending = False)
timestamp.head()

Unnamed: 0_level_0,href,value
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1
1724844609,1,1
1724844599,1,1
1724844571,1,1
1724769088,1,1
1724679492,1,1


In [40]:
likes_df_raw['Timestamp']=likes_df_timed['timestamp']
likes_df_raw.Timestamp = likes_df_raw.Timestamp.astype(float)

In [41]:
likes_df_raw

Unnamed: 0,title,string_list_data,Timestamp
0,bazarnov3d,[{'href': 'https://www.instagram.com/reel/C-V2...,1.724845e+09
1,sovjetanimals,[{'href': 'https://www.instagram.com/reel/C-lU...,1.724845e+09
2,alt.overload,[{'href': 'https://www.instagram.com/p/C_Loucx...,1.724845e+09
3,psychological,[{'href': 'https://www.instagram.com/reel/C_Js...,1.724769e+09
4,the_frog_mage,[{'href': 'https://www.instagram.com/p/C-OQAT2...,1.724679e+09
...,...,...,...
1868,frog_paradise,[{'href': 'https://www.instagram.com/reel/CwgG...,1.693481e+09
1869,westbrouck,[{'href': 'https://www.instagram.com/reel/Cwla...,1.693433e+09
1870,meowstershots,[{'href': 'https://www.instagram.com/reel/Cu6B...,1.693433e+09
1871,giuseppes.crabapples,[{'href': 'https://www.instagram.com/p/CwYWA63...,1.693407e+09


In [42]:
likes_df_comp = likes_df_raw.drop(columns = ['string_list_data'])
likes_df_comp

Unnamed: 0,title,Timestamp
0,bazarnov3d,1.724845e+09
1,sovjetanimals,1.724845e+09
2,alt.overload,1.724845e+09
3,psychological,1.724769e+09
4,the_frog_mage,1.724679e+09
...,...,...
1868,frog_paradise,1.693481e+09
1869,westbrouck,1.693433e+09
1870,meowstershots,1.693433e+09
1871,giuseppes.crabapples,1.693407e+09


The dropped columns above are redundant since we will be adding the seperated date into new columns for easier access.

In [43]:
likes_df_comp['Content']=likes_df_timed['href']
likes_df_comp

Unnamed: 0,title,Timestamp,Content
0,bazarnov3d,1.724845e+09,https://www.instagram.com/reel/C-V2Q8bK5kQ/
1,sovjetanimals,1.724845e+09,https://www.instagram.com/reel/C-lUh0Co2re/
2,alt.overload,1.724845e+09,https://www.instagram.com/p/C_LoucxpGQc/
3,psychological,1.724769e+09,https://www.instagram.com/reel/C_JsxyASJGx/
4,the_frog_mage,1.724679e+09,https://www.instagram.com/p/C-OQAT2p_Mq/
...,...,...,...
1868,frog_paradise,1.693481e+09,https://www.instagram.com/reel/CwgGZ3AsXP-/
1869,westbrouck,1.693433e+09,https://www.instagram.com/reel/CwlaW30yE39/
1870,meowstershots,1.693433e+09,https://www.instagram.com/reel/Cu6BmUZI3i3/
1871,giuseppes.crabapples,1.693407e+09,https://www.instagram.com/p/CwYWA63pEZD/


In [44]:
likes_df_tidy = likes_df_comp.groupby('Timestamp').count().sort_values('Timestamp',ascending = False)
likes_df = likes_df_tidy.rename(columns={'title': 'Profile'})
likes_df.head()

Unnamed: 0_level_0,Profile,Content
Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1
1724845000.0,1,1
1724845000.0,1,1
1724845000.0,1,1
1724769000.0,1,1
1724679000.0,1,1


Although minimized, above is the content previously seen organized by timestamp and showing both the profile and the Content in which was liked. It also gives us insight on nuk

Next we build a hypothesis. My Hypothesis is that there is a correlation between the accounts I often like and who I am following. This hypothesis can be broken down as follows:

**Theoretical:**
I often follow the accounts I give likes to the most

**Hypothetical:**
Accounts with more than 1 posts that are liked will appear in my followed lists.

to prove this, I will be utilizing the "following.json" file from my instagram data, as it will show who I am currently following in order to reference that information with the known infomration regarding to liked posts. This will greatly support our effort to test this hypothesis.

In [45]:
with open('following.json') as F:
    fol_raw = json.load(F)

With that our new json file is loaded! similarly to the beginning of this report, we can view the raw data below:

*[Data Display Removed for Privacy Reasons]*

Next we can view the keys within that data in order to begin creating a usable dataframe

In [46]:
fol_raw.keys()

dict_keys(['relationships_following'])

Since we have seen the "string_list_data" subcategory before with our liked posts data, we can reuse an above code cell in order to extract that data specifically.

In [47]:
fol_string_list_data = [x['string_list_data'][0] for x in fol_raw['relationships_following']]

*[Data Display Removed for Privacy Reasons]*

and bam! there is our data! now we just need to form it into a dataframe and see what we are working with!

In [48]:
fol_df_raw = pd.DataFrame(fol_string_list_data)
fol_df_raw

Unnamed: 0,href,value,timestamp
0,https://www.instagram.com/eidens.gh0st,eidens.gh0st,1724421091
1,https://www.instagram.com/dinghuart,dinghuart,1723488296
2,https://www.instagram.com/duckeycaps,duckeycaps,1723244454
3,https://www.instagram.com/alive._.sheep,alive._.sheep,1723080311
4,https://www.instagram.com/sashacolby,sashacolby,1722905964
...,...,...,...
95,https://www.instagram.com/bantheman09,bantheman09,1694967675
96,https://www.instagram.com/xx.f4ll3n.4ng3l.xx,xx.f4ll3n.4ng3l.xx,1694602370
97,https://www.instagram.com/livvatter,livvatter,1694555676
98,https://www.instagram.com/the.screaming.bean,the.screaming.bean,1693751390


Looks like there is a bit more data here than we need. Luckily we can drop that unncecessary column and rename our similar columns to ones before in order to make this data a bit tidier

In [49]:
fol_df_temp_1 = fol_df_raw.drop(columns = ['href'])
fol_df_tidy = fol_df_temp_1.rename(columns={'value': 'Profile', 'timestamp': 'Timestamp'})
fol_df_tidy.head()

Unnamed: 0,Profile,Timestamp
0,eidens.gh0st,1724421091
1,dinghuart,1723488296
2,duckeycaps,1723244454
3,alive._.sheep,1723080311
4,sashacolby,1722905964


In [50]:
likes_df.head()

Unnamed: 0_level_0,Profile,Content
Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1
1724845000.0,1,1
1724845000.0,1,1
1724845000.0,1,1
1724769000.0,1,1
1724679000.0,1,1


and there we go! although it would require some interpreting, it appears both data sets are in and available! You could easily use the timestamp value and the profile when expaned in order to compare the likes held in the content area in order to determine which profiles that recieved likes are followed and which are not! Although I would love to expand further and show that comparison, I am unfortanately at the extent of my knowledge at this time. But I look forward to learning more!

The new data represents the time in which profiles were followed. I believe all final dataframes meet tidy standards and can be used to test the hypothesis.

I believe the data itself certainly has limitations. Unfortunately, the specificity of the data is often its downfall, as it seems harder to make more general statements with as it only pertains to one user. Having a set of "median" reference points would be helpful from instagram. Although other appraches to this data are likely, I am unsure what they are at this point in the class. I believe next steps in this report would include review of the data and creation of complex interactive UI's that allow deeper comparison and exploration of the data gained.

## References
Yes! I used references! Working on the project I was unfortunately absent on workshop day due to food poisoning. So I decided to use the web as a resource to help me when I get stuck! I want to include a list fo all references used and what I used them for!

Likes Report by Ryan West
https://github.com/rwest21/likes_report/blob/main/Likes_Report2.ipynb
Was used in order to gain help with certain functions syntax

Likes Report by Corey Heim
https://github.com/CHeim123/Likes-Report/blob/main/Insta_Likes.ipynb
Was used to see an example of a hypothetical and statistical hypothesis within an assignment

Likes Report by Andrew Gudz
https://github.com/agudz981/Gudz_Andrew_LikesReport/blob/main/Gudz_Andrew_LikeReport.ipynb
Was used as a genral assignment example for style purposes

Stack Overflow FAQ
https://stackoverflow.com/questions/11346283/renaming-column-names-in-pandas
Was used to gain syntax for .rename in pandas dataframes

Stack Overfflow FAQ
https://stackoverflow.com/questions/33497896/how-can-i-add-a-column-from-one-dataframe-to-another-dataframe
Was used to gain syntax for .astype