`To Do:`
- [ ] check if post type can be added as feature

---

# Instagram Like Prediction @310ai Competition

This notebook is for the competition posted by the @310ai on 15th of April. I will approach the competition as a project following the CRISP-DM methodology and try to explain the approach in every steps of the way.

The main and short summary of this competition is **"given an Instagram post predict the number of likes"**.

## Business Understanding
First thing first, there are some important points that we have to consider which are forced by the Instagram. This points will result in some features that are effective in percision of the model. In the following section we will discuss them further.

***Are we try to predict the number of likes for an Instagram post of our own or not?***

This question might seem a little odd, but let me explain it. Each Instagram post consists of some metrics that show the performance of the post among the users. We will call these **"Performance Metrics"**. Some of these performance metrics such as amount of like, amount of columns, caption and etc, are publicly availble, in other words, any user on the Instagram can see them.

But some of the performance metrics, are not publicly available, in order to see them, we need to authenticate as the owner of the page (will discuss about this part further in this section.), some of these private performance metrics are, amount of share, amount of save, amount of reach, amount of profile visits, amount of follows, amount of impression and etc.

Obviously, if we try to predict the amount of like for a page that we don't own, we can not access these features, we will go for a page that we don't have access to it for this competition.

Another to have in mind is that, since the post we are going to predict the amount of like for it, is not actually existing, the amount of performance metrics can't be predicted preciesly. In other words, how we can estimate the amount of comments a hypothetical post might recieve if we don't post it actually. Due to this abstraction, the performance metrics for each post is not a good feature for this deed.

In the further section I will try to address the questions of the competitions in combination of code and text. Please have in mind to follow the chosen methodology I might change the order of questions.

## Data Requirements and Data Collection

In this section I will tackle the questions mainly related to these parts of the challenge. As we discussed above some useful features introduced that might have effect on the precision of the prediction. But there are some other features, further I will point to some features that are related to the page of the published post.

### What Features you used?

Each and every page on the Instagram has some features that will distinguish it from other pages, some of these features are like the features discussed above, performance metrics, and some of them are identifiers. Some of the identifiers features are:
- `id`: a unique id that is allocated by the Instagram.
- `username`: a unique username that each user when created the page chose.

Also there are some other features that we will investigate, these features are:
- `category_name`: each page based on the published content and some other traits, are categorized into different categories, for instance, Blogger, Personal Blog, Design & Fashion, chef and etc.
- `follower`: amount of followers the page has.
- `following`: amount of pages that the target page is following.
- `ar_effect`: whether the page has published ar effects in the Instagram or not.
- `type_business`: whether the page identified itself as business account or not.
- `type_professional`: whether the page identified itself as professional account or not.
- `verified`: whether the page is verified or not.
- `reel_count`: amount of igtvs posted by the page.
- `media_count`: amount of posts, posted by the page.

There are some features that are collected organically but can be calculated in the process of feature engineering. Some of them are:
- `reel_view`: The average view of igtvs posted by the page.
- `reel_comment`: The average of comments igtvs acquired.
- `reel_like`: The average of likes igtvs got.
- `reel_duration`: The average of igtv's duration posted by the page.
- `reel_frequency`: How often the page have posted the reels.
- `media_avg_view`: The average view of media posted by the page.
- `media_avg_comment`: The average of comments media acquired.
- `media_avg_like`: The average of likes media got.
- `media_avg_duration`: The average of media's duration posted by the page.
- `media_frequency`: How often the page have posted the media.

Last but not least, is the content of the image itself. There are multiple ways to have the content of the image as feature. For instance we can have a classifier network to detect what objects are present in the image and pass them to the like predictor model. Other heuristic approaches might result in a good model, such as passing the image vector generated by the last hidden layer of a classification network as a standalone feature.

As you are aware, choosing the best strategy requires some tests, such as A/B tests and trial and error ones, for now I will chose the strategy which will be discussed further that is fastest and heuristic.

It's been some time that the Meta, is using an object detection model for generating the Alt Text attribute for the posts. Due to the resources the Meta have in its disposal, this model is extremly face and reliable since it is ran on the server side. Thus for this approach we will use the result of the what objects are present in the image as feature.

## How do we collect the data?

As we discussed above, there are different kind of features, and each group can be collected via different methods.

The Instagram provides an API for developers, but due some restrictions and limitations, this API can not provide us the data that we seek. Based on this facts, we will use a heuristic way to collect the data. There will be 2 approaches regarding the matter. one approach which is not very tech-friendly (:D) is to create a scrapper with Selenium page in python to scrap the information we need. Selenium is a website testing library in python that can also be utilized into a webscrapper. This approach has another limitation excluse for users like me, since I'm in Iran right now, access to the Instagram is restricted and we have to use VPNs and geo-restriction bypasses, these tools add another layer of challenge and additional bottleneck. Another approach that I try to utilize, is to use the graphql endpoints to recieve the information needed in JSON format. Eventhough still use of VPNs and similar tools is needed in this approach, but unlike the Selenium this approach doesn't require to load the GUI of Instagram, its much more faster and eligble in a pipeline.

- end point for user information:
`https://www.instagram.com/{username}/?__a=1&__d=dis
`

- end point for post information:
`https://www.instagram.com/p/{post_ID}/?__a=1&__d=dis
`

getting training data for the model:
- each json response of an account gives 12 latest post
information:

  - Alt text information is here: `data['graphql']['user']['edge_owner_to_timeline_media']['edges'][0]['node']['accessibility_caption']`
    - each node has type, `GraphImage` is posts which have alt text.
    - `GraphVideo` doesn't have alt text.
    - `GraphSideCar` is carousel and have alt text.
  - number of comments is here: `data['graphql']['user']['edge_owner_to_timeline_media']['edges'][0]['node']['edge_media_to_comment']['count']`
  - number of likes is here: `data['graphql']['user']['edge_owner_to_timeline_media']['edges'][0]['node']['edge_liked_by']['count']`


In [4]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import requests
from datetime import datetime
import json
import re
import numpy as np
import pandas as pd
from tqdm import tqdm
import time

# reading accounts lists for gathering training data.
with open('Data/top_100_follower.txt') as f:
    lines = f.readlines()
top_100_followers = lines[0].split(',')

with open('Data/top_100_posts.txt') as f:
    lines = f.readlines()
top_100_posts = lines[0].split(',')

# since added try exception in the main body of collecting data, this section is probably unnecessary, double check it.
main_accounts_df = pd.DataFrame(columns=['id', 'username', 'category_name', 'follower', 'following', 'ar_effect', 'type_business', 'type_professional', 'verified', 'reel_count', 'reel_avg_view', 'reel_avg_comment', 'reel_avg_like', 'reel_avg_duration', 'reel_frequency', 'media_count', 'media_avg_comment', 'media_avg_like', 'media_frequency'])
main_posts_df = pd.DataFrame(columns=['shortcode', 'post_type', 'username', 'like', 'comment', 'object_1', 'object_2', 'object_3', 'object_4', 'object_5','object_6'])

def flatten(lst):
    """A helper function to flatten any dimensional python list to 1D one.

    Args:
        lst (list): multi dimension python list

    Returns:
        list: flattened list
    """
    rt = []
    for i in lst:
        if isinstance(i,list): rt.extend(flatten(i))
        else: rt.append(i)
    return rt

#### Logging into the Instagram account
This step is necesary for getting information of the images, since majority of information in Instagram are locked behind the authentication wall.

In [21]:
link = 'https://www.instagram.com/accounts/login/'
login_url = 'https://www.instagram.com/accounts/login/ajax/'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36',
            'referer':'https://www.instagram.com/',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.5',
            'Accept-Encoding': 'gzip, deflate, br',
            'Connection': 'keep-alive',
            'Upgrade-Insecure-Requests': '1',
            'Sec-Fetch-Dest': 'document',
            'Sec-Fetch-Mode': 'navigate',
            'Sec-Fetch-Site': 'none',
            'Sec-Fetch-User': '?1',
            'TE': 'trailers'
}


current_time = int(datetime.now().timestamp())
response = requests.Session().get(link, headers=headers)
if response.ok:
    csrf = re.findall(r'csrf_token\\":\\"(.*?)\\"',response.text)[0]
    username = 'rfdeveloping'
    password = 'ramin1234'

    payload = {
        'username': username,
        'enc_password': f'#PWD_INSTAGRAM_BROWSER:0:{current_time}:{password}',
        'queryParams': {},
        'optIntoOneTap': 'false',
        'stopDeletionNonce': '',
        'trustedDeviceRecords': '{}'
    }

    login_header = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36',
        "X-Requested-With": "XMLHttpRequest",
        "Referer": "https://www.instagram.com/accounts/login/",
        "X-CSRFToken": csrf,
        'Accept': '*/*',
        'Accept-Language': 'en-US,en;q=0.5',
        'X-Instagram-AJAX': 'c6412f1b1b7b',
        'X-IG-App-ID': '936619743392459',
        'X-ASBD-ID': '198387',
        'X-IG-WWW-Claim': '0',
        'X-Requested-With': 'XMLHttpRequest',
        'Origin': 'https://www.instagram.com',
        'DNT': '1',
        'Connection': 'keep-alive',
        'Referer': 'https://www.instagram.com/accounts/login/?',
        'Sec-Fetch-Dest': 'empty',
        'Sec-Fetch-Mode': 'cors',
        'Sec-Fetch-Site': 'same-origin',
    }

    login_response = requests.post(login_url, data=payload, headers=login_header)
    json_data = json.loads(login_response.text)


    if json_data['status'] == 'fail':
        print(json_data['message'])

    elif json_data["authenticated"]:
        print("login successful")
        cookies = login_response.cookies
        cookie_jar = cookies.get_dict()
        csrf_token = cookie_jar['csrftoken']
        print("csrf_token: ", csrf_token)
        session_id = cookie_jar['sessionid']
        print("session_id: ", session_id)

    else:
        print("login failed ", login_response.text)
else:
    print('error')
    print(response)

login successful
csrf_token:  9vHtMLb9fM1vXgzhYgnmPm6hiGcDnTxk
session_id:  1691538713%3A4U7RAkN6UQ9HWH%3A5%3AAYcsGkH_-69pYpwEIxjJATIipnc9OVEwa-YlVqaaKQ


#### Collecting Data

In [22]:
# add read main accounts and main posts csv files here as dataframe
try:
    main_accounts_df = pd.read_csv('Data/accounts.csv')
    main_posts_df = pd.read_csv('Data/posts.csv')
    main_accounts_df.drop(columns=['Unnamed: 0'], inplace=True)
    main_posts_df.drop(columns=['Unnamed: 0'], inplace=True)
except:
    main_accounts_df = pd.DataFrame(columns=['id', 'username', 'category_name', 'follower', 'following', 'ar_effect', 'type_business', 'type_professional', 'verified', 'reel_count', 'reel_avg_view', 'reel_avg_comment', 'reel_avg_like', 'reel_avg_duration', 'reel_frequency', 'media_count', 'media_avg_comment', 'media_avg_like', 'media_frequency'])
    main_posts_df = pd.DataFrame(columns=['shortcode', 'post_type', 'username', 'like', 'comment', 'object_1', 'object_2', 'object_3', 'object_4', 'object_5','object_6'])


for username in tqdm(top_100_followers):
    print(f'Getting Account Information: {username}')
    # loading account information
    session = {
            "csrf_token": csrf_token,
            "session_id": session_id
        }

    headers = {
            "x-csrftoken": session['csrf_token'],
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36',
            "X-Requested-With": "XMLHttpRequest",
            "Referer": "https://www.instagram.com/accounts/login/",
            'Accept': '*/*',
            'Accept-Language': 'en-US,en;q=0.5',
            'X-Instagram-AJAX': 'c6412f1b1b7b',
            'X-IG-App-ID': '936619743392459',
            'X-ASBD-ID': '198387',
            'X-IG-WWW-Claim': '0',
            'X-Requested-With': 'XMLHttpRequest',
            'Origin': 'https://www.instagram.com',
            'DNT': '1',
            'Connection': 'keep-alive',
            'Referer': 'https://www.instagram.com/accounts/login/?',
            'Sec-Fetch-Dest': 'empty',
            'Sec-Fetch-Mode': 'cors',
            'Sec-Fetch-Site': 'same-origin',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.5',
            'Accept-Encoding': 'gzip, deflate, br',
            'Connection': 'keep-alive',
            'Upgrade-Insecure-Requests': '1',
            'Sec-Fetch-Dest': 'document',
            'Sec-Fetch-Mode': 'navigate',
            'Sec-Fetch-Site': 'none',
            'Sec-Fetch-User': '?1',
            'TE': 'trailers'
        }

    cookies = {
            "sessionid": session['session_id'],
            "csrftoken": session['csrf_token']
        }
    url = f'https://www.instagram.com/{username}/?__a=1&__d=dis'
    res = requests.get(url, headers=headers, cookies=cookies)
    # add error handling here based on response codes, reference -> InstagramBot.py

    data = res.json()
    followers = data['graphql']['user']['edge_followed_by']['count']
    following = data['graphql']['user']['edge_follow']['count']
    ar_effect = data['graphql']['user']['has_ar_effects']
    id = data['graphql']['user']['id']
    type_business = data['graphql']['user']['is_business_account']
    type_professional = data['graphql']['user']['is_professional_account']
    category = data['graphql']['user']['category_name']
    verified = data['graphql']['user']['is_verified']
    reel_count = data['graphql']['user']['edge_felix_video_timeline']['count']
    media_count = data['graphql']['user']['edge_owner_to_timeline_media']['count']
    username = data['graphql']['user']['username']

    reel_view_list = []
    reel_like_list = []
    reel_comment_list = []
    reel_duration_list = []
    reel_timestamp_list = []

    media_like_list = []
    media_comment_list = []
    media_timestamp_list = []

    for video in data['graphql']['user']['edge_felix_video_timeline']['edges']:
        reel_view_list.append(video['node']['video_view_count'])
        reel_comment_list.append(video['node']['edge_media_to_comment']['count'])
        reel_timestamp_list.append(video['node']['taken_at_timestamp'])
        reel_like_list.append(video['node']['edge_liked_by']['count'])
        reel_duration_list.append(video['node']['video_duration'])
    
    # sometimes instagram result for video duration is None, this is sanity check
    reel_duration_list = [0 if duration is None else duration for duration in reel_duration_list]

    for medium in data['graphql']['user']['edge_owner_to_timeline_media']['edges']:
        media_like_list.append(medium['node']['edge_liked_by']['count'])
        media_comment_list.append(medium['node']['edge_media_to_comment']['count'])
        media_timestamp_list.append(medium['node']['taken_at_timestamp'])

    
    reel_utc_list = [datetime.utcfromtimestamp(ts) for ts in reel_timestamp_list]
    media_utc_list = [datetime.utcfromtimestamp(ts) for ts in media_timestamp_list]

    reel_utc_difference_list = [reel_utc_list[i] - reel_utc_list[i+1] for i in range(len(reel_utc_list) - 1)]
    media_utc_difference_list = [media_utc_list[i] - media_utc_list[i+1] for i in range(len(media_utc_list) - 1)]

    if reel_count > 1:
        reel_frequency = np.mean(reel_utc_difference_list).days + (np.mean(reel_utc_difference_list).seconds / 86_400) + (np.mean(reel_utc_difference_list).microseconds / 1_000_000 / 84_600)
    else:
        reel_frequency = 0
    media_frequency = np.mean(media_utc_difference_list).days + (np.mean(media_utc_difference_list).seconds / 86_400) + (np.mean(media_utc_difference_list).microseconds / 1_000_000 / 84_600)

    reel_view_mean = np.mean(reel_view_list)
    reel_like_mean = np.mean(reel_like_list)
    reel_comment_mean = np.mean(reel_comment_list)
    reel_duration_mean = np.mean(reel_duration_list)

    media_like_mean = np.mean(media_like_list)
    media_comment_mean = np.mean(media_comment_list)

    entry_lst = [id, username, category, followers, following, ar_effect, type_business, type_professional, verified, reel_count, reel_view_mean, reel_comment_mean, reel_like_mean, reel_duration_mean, reel_frequency, media_count, media_comment_mean, media_like_mean, media_frequency]

    account_df = pd.DataFrame() #reset variable
    account_df = pd.DataFrame([entry_lst], columns=['id', 'username', 'category_name', 'follower', 'following', 'ar_effect', 'type_business', 'type_professional', 'verified', 'reel_count', 'reel_avg_view', 'reel_avg_comment', 'reel_avg_like', 'reel_avg_duration', 'reel_frequency', 'media_count', 'media_avg_comment', 'media_avg_like', 'media_frequency'])

    if account_df.username.isin(main_accounts_df.username).bool():
        print('User information already exist, skipping...')
        continue
    else:
        print(f'Adding {username} information...')
        account_df = account_df.astype({
            'ar_effect': bool,
            'type_business': bool,
            'type_professional': bool,
            'verified': bool,
        })
        main_accounts_df = pd.concat([main_accounts_df, account_df], axis=0, join='outer')
    
    # adding user's posts information
    print(f'Getting Posts Information: {username}')
    # main lists structure is:
    # shortcode, post_type, username, objects
    posts_lst = []
    for post in data['graphql']['user']['edge_owner_to_timeline_media']['edges']:
        temp_lst = []
        objects = []
        temp_lst.append(post['node']['shortcode'])
        temp_lst.append(post['node']['__typename'])
        temp_lst.append(data['graphql']['user']['username'])
        temp_lst.append(post['node']['edge_liked_by']['count'])
        temp_lst.append(post['node']['edge_media_to_comment']['count'])
        if post['node']['__typename'] == 'GraphImage' or post['node']['__typename'] == 'GraphSidecar':
            if post['node']['accessibility_caption'] == None:
                objects = []
                continue
            # split object-detection output
            objects = post['node']['accessibility_caption'].split('.')[1]
            # terminating empty lists
            if objects:
                try:
                    # cutting objects
                    objects = objects.split('of')[1]
                    objects = objects.split('and', 1)
                    objects[0] = objects[0].split(',')
                    if 'text' in objects[1]:
                        objects[1] = 'text'
                except:
                    continue
                # flattening the objects list to make the dimension 1D
                objects = flatten(objects)
                # terminating leading and trailing spaces from list items
                objects = [item.strip() for item in objects]
            else:
                objects = []
        # padding the objects list, we set the limit to 6 objects
        objects += ['No Object'] * (6 - len(objects))
        if len(objects) > 6:
            objects = objects[:6]
        temp_lst.append(objects)
        posts_lst.append(flatten(temp_lst))

    # creating temporary dataframe for posts of this account
    temp_df = pd.DataFrame()
    temp_df = pd.DataFrame(posts_lst, columns=[
        'shortcode',
        'post_type',
        'username',
        'like',
        'comment',
        'object_1',
        'object_2',
        'object_3',
        'object_4',
        'object_5',
        'object_6'
    ])

    if temp_df.username.isin(main_posts_df.username)[0]:
        print('User post information already exist, skiping...')
        continue
    else:
        print(f'Adding {username} posts information...')
        main_posts_df = pd.concat([main_posts_df, temp_df], axis=0, join='outer')
    
    # saving the data each time
    main_accounts_df.to_csv('Data/accounts.csv')
    main_posts_df.to_csv('Data/posts.csv')

    # waiting 5 sec for each  user, instagram rate limit
    print('Waiting 5 seconds...')
    time.sleep(5)

  0%|          | 0/100 [00:00<?, ?it/s]

Getting Account Information: instagram
Adding instagram information...
Getting Posts Information: instagram
Adding instagram posts information...
Waiting 5 seconds...


  1%|          | 1/100 [00:07<11:59,  7.26s/it]

Getting Account Information: cristiano
Adding cristiano information...
Getting Posts Information: cristiano
Adding cristiano posts information...
Waiting 5 seconds...


  2%|▏         | 2/100 [00:14<11:39,  7.14s/it]

Getting Account Information: leomessi
Adding leomessi information...
Getting Posts Information: leomessi
Adding leomessi posts information...
Waiting 5 seconds...


  3%|▎         | 3/100 [00:21<11:16,  6.97s/it]

Getting Account Information: selenagomez
Adding selenagomez information...
Getting Posts Information: selenagomez
Adding selenagomez posts information...
Waiting 5 seconds...


  4%|▍         | 4/100 [00:27<11:06,  6.94s/it]

Getting Account Information: kyliejenner
Adding kyliejenner information...
Getting Posts Information: kyliejenner
Adding kyliejenner posts information...
Waiting 5 seconds...


  5%|▌         | 5/100 [00:34<10:51,  6.86s/it]

Getting Account Information: therock
Adding therock information...
Getting Posts Information: therock
Adding therock posts information...
Waiting 5 seconds...


  6%|▌         | 6/100 [00:42<11:12,  7.15s/it]

Getting Account Information: arianagrande
Adding arianagrande information...
Getting Posts Information: arianagrande
Adding arianagrande posts information...
Waiting 5 seconds...


  7%|▋         | 7/100 [00:49<11:10,  7.21s/it]

Getting Account Information: kimkardashian
Adding kimkardashian information...
Getting Posts Information: kimkardashian
Adding kimkardashian posts information...
Waiting 5 seconds...


  8%|▊         | 8/100 [00:57<11:06,  7.25s/it]

Getting Account Information: beyonce
Adding beyonce information...
Getting Posts Information: beyonce
Adding beyonce posts information...
Waiting 5 seconds...


  9%|▉         | 9/100 [01:04<11:14,  7.41s/it]

Getting Account Information: khloekardashian
Adding khloekardashian information...
Getting Posts Information: khloekardashian
Adding khloekardashian posts information...
Waiting 5 seconds...


 10%|█         | 10/100 [01:11<10:56,  7.30s/it]

Getting Account Information: justinbieber
Adding justinbieber information...
Getting Posts Information: justinbieber
Adding justinbieber posts information...
Waiting 5 seconds...


 11%|█         | 11/100 [01:19<10:46,  7.26s/it]

Getting Account Information: nike
Adding nike information...
Getting Posts Information: nike
Adding nike posts information...
Waiting 5 seconds...


 12%|█▏        | 12/100 [01:26<10:38,  7.25s/it]

Getting Account Information: kendalljenner
Adding kendalljenner information...
Getting Posts Information: kendalljenner
Adding kendalljenner posts information...
Waiting 5 seconds...


 13%|█▎        | 13/100 [01:33<10:20,  7.13s/it]

Getting Account Information: natgeo
Adding natgeo information...
Getting Posts Information: natgeo
Adding natgeo posts information...
Waiting 5 seconds...


 14%|█▍        | 14/100 [01:40<10:18,  7.19s/it]

Getting Account Information: taylorswift
Adding taylorswift information...
Getting Posts Information: taylorswift
Adding taylorswift posts information...
Waiting 5 seconds...


 15%|█▌        | 15/100 [01:47<10:09,  7.17s/it]

Getting Account Information: virat.kohli
Adding virat.kohli information...
Getting Posts Information: virat.kohli
Adding virat.kohli posts information...
Waiting 5 seconds...


 16%|█▌        | 16/100 [01:54<09:50,  7.03s/it]

Getting Account Information: jlo
Adding jlo information...
Getting Posts Information: jlo
Adding jlo posts information...
Waiting 5 seconds...


 17%|█▋        | 17/100 [02:01<09:38,  6.97s/it]

Getting Account Information: kourtneykardash
Adding kourtneykardash information...
Getting Posts Information: kourtneykardash
Adding kourtneykardash posts information...
Waiting 5 seconds...


 18%|█▊        | 18/100 [02:08<09:45,  7.14s/it]

Getting Account Information: nickiminaj
Adding nickiminaj information...
Getting Posts Information: nickiminaj
Adding nickiminaj posts information...
Waiting 5 seconds...


 19%|█▉        | 19/100 [02:16<09:56,  7.36s/it]

Getting Account Information: neymarjr
Adding neymarjr information...
Getting Posts Information: neymarjr
Adding neymarjr posts information...
Waiting 5 seconds...


 20%|██        | 20/100 [02:23<09:41,  7.27s/it]

Getting Account Information: mileycyrus
Adding mileycyrus information...
Getting Posts Information: mileycyrus
Adding mileycyrus posts information...
Waiting 5 seconds...


 21%|██        | 21/100 [02:30<09:32,  7.25s/it]

Getting Account Information: katyperry
Adding katyperry information...
Getting Posts Information: katyperry
Adding katyperry posts information...
Waiting 5 seconds...


 22%|██▏       | 22/100 [02:38<09:39,  7.43s/it]

Getting Account Information: zendaya
Adding zendaya information...
Getting Posts Information: zendaya
Adding zendaya posts information...
Waiting 5 seconds...


 23%|██▎       | 23/100 [02:45<09:29,  7.40s/it]

Getting Account Information: kevinhart4real
Adding kevinhart4real information...
Getting Posts Information: kevinhart4real
Adding kevinhart4real posts information...
Waiting 5 seconds...


 24%|██▍       | 24/100 [02:53<09:21,  7.39s/it]

Getting Account Information: ddlovato
Adding ddlovato information...
Getting Posts Information: ddlovato
Adding ddlovato posts information...
Waiting 5 seconds...


 25%|██▌       | 25/100 [03:00<09:11,  7.36s/it]

Getting Account Information: kingjames
Adding kingjames information...
Getting Posts Information: kingjames
Adding kingjames posts information...
Waiting 5 seconds...


 26%|██▌       | 26/100 [03:08<09:06,  7.38s/it]

Getting Account Information: badgalriri
Adding badgalriri information...
Getting Posts Information: badgalriri
Adding badgalriri posts information...
Waiting 5 seconds...


 27%|██▋       | 27/100 [03:15<08:54,  7.32s/it]

Getting Account Information: realmadrid
Adding realmadrid information...
Getting Posts Information: realmadrid
Adding realmadrid posts information...
Waiting 5 seconds...


 28%|██▊       | 28/100 [03:23<09:09,  7.63s/it]

Getting Account Information: champagnepapi
Adding champagnepapi information...
Getting Posts Information: champagnepapi
Adding champagnepapi posts information...
Waiting 5 seconds...


 29%|██▉       | 29/100 [03:32<09:28,  8.01s/it]

Getting Account Information: chrisbrownofficial
Adding chrisbrownofficial information...
Getting Posts Information: chrisbrownofficial
Adding chrisbrownofficial posts information...
Waiting 5 seconds...


 30%|███       | 30/100 [03:40<09:17,  7.96s/it]

Getting Account Information: fcbarcelona
Adding fcbarcelona information...
Getting Posts Information: fcbarcelona
Adding fcbarcelona posts information...
Waiting 5 seconds...


 31%|███       | 31/100 [03:50<09:51,  8.57s/it]

Getting Account Information: billieeilish
Adding billieeilish information...
Getting Posts Information: billieeilish
Adding billieeilish posts information...
Waiting 5 seconds...


 32%|███▏      | 32/100 [04:00<10:09,  8.97s/it]

Getting Account Information: championsleague
Adding championsleague information...
Getting Posts Information: championsleague
Adding championsleague posts information...
Waiting 5 seconds...


 33%|███▎      | 33/100 [04:09<09:59,  8.95s/it]

Getting Account Information: k.mbappe
Adding k.mbappe information...
Getting Posts Information: k.mbappe
Adding k.mbappe posts information...
Waiting 5 seconds...


 34%|███▍      | 34/100 [04:16<09:25,  8.57s/it]

Getting Account Information: gal_gadot
Adding gal_gadot information...
Getting Posts Information: gal_gadot
Adding gal_gadot posts information...
Waiting 5 seconds...


 35%|███▌      | 35/100 [04:25<09:12,  8.50s/it]

Getting Account Information: vindiesel
Adding vindiesel information...
Getting Posts Information: vindiesel
Adding vindiesel posts information...
Waiting 5 seconds...


 36%|███▌      | 36/100 [04:32<08:48,  8.26s/it]

Getting Account Information: lalalalisa_m
Adding lalalalisa_m information...
Getting Posts Information: lalalalisa_m
Adding lalalalisa_m posts information...
Waiting 5 seconds...


 37%|███▋      | 37/100 [04:41<08:52,  8.45s/it]

Getting Account Information: nasa
Adding nasa information...
Getting Posts Information: nasa
Adding nasa posts information...
Waiting 5 seconds...


 38%|███▊      | 38/100 [04:50<08:45,  8.47s/it]

Getting Account Information: dualipa
Adding dualipa information...
Getting Posts Information: dualipa
Adding dualipa posts information...
Waiting 5 seconds...


 39%|███▉      | 39/100 [05:02<09:41,  9.53s/it]

Getting Account Information: priyankachopra
Adding priyankachopra information...
Getting Posts Information: priyankachopra
Adding priyankachopra posts information...
Waiting 5 seconds...


 40%|████      | 40/100 [05:12<09:35,  9.59s/it]

Getting Account Information: shakira
Adding shakira information...
Getting Posts Information: shakira
Adding shakira posts information...
Waiting 5 seconds...


 41%|████      | 41/100 [05:18<08:37,  8.78s/it]

Getting Account Information: snoopdogg
Adding snoopdogg information...
Getting Posts Information: snoopdogg
Adding snoopdogg posts information...
Waiting 5 seconds...


 42%|████▏     | 42/100 [05:28<08:34,  8.88s/it]

Getting Account Information: shraddhakapoor
Adding shraddhakapoor information...
Getting Posts Information: shraddhakapoor
Adding shraddhakapoor posts information...
Waiting 5 seconds...


 43%|████▎     | 43/100 [05:34<07:52,  8.30s/it]

Getting Account Information: khaby00
Adding khaby00 information...
Getting Posts Information: khaby00
Adding khaby00 posts information...
Waiting 5 seconds...


 44%|████▍     | 44/100 [05:41<07:19,  7.84s/it]

Getting Account Information: nba
Adding nba information...
Getting Posts Information: nba
Adding nba posts information...
Waiting 5 seconds...


 45%|████▌     | 45/100 [05:52<07:54,  8.62s/it]

Getting Account Information: davidbeckham
Adding davidbeckham information...
Getting Posts Information: davidbeckham
Adding davidbeckham posts information...
Waiting 5 seconds...


 46%|████▌     | 46/100 [05:58<07:13,  8.03s/it]

Getting Account Information: gigihadid
Adding gigihadid information...
Getting Posts Information: gigihadid
Adding gigihadid posts information...
Waiting 5 seconds...


 47%|████▋     | 47/100 [06:05<06:49,  7.73s/it]

Getting Account Information: jennierubyjane
Adding jennierubyjane information...
Getting Posts Information: jennierubyjane
Adding jennierubyjane posts information...
Waiting 5 seconds...


 48%|████▊     | 48/100 [06:13<06:36,  7.63s/it]

Getting Account Information: aliaabhatt
Adding aliaabhatt information...
Getting Posts Information: aliaabhatt
Adding aliaabhatt posts information...
Waiting 5 seconds...


 49%|████▉     | 49/100 [06:20<06:17,  7.40s/it]

Getting Account Information: victoriassecret
Adding victoriassecret information...
Getting Posts Information: victoriassecret
Adding victoriassecret posts information...
Waiting 5 seconds...


 50%|█████     | 50/100 [06:26<06:01,  7.23s/it]

Getting Account Information: narendramodi
Adding narendramodi information...
Getting Posts Information: narendramodi
Adding narendramodi posts information...
Waiting 5 seconds...


 51%|█████     | 51/100 [06:33<05:46,  7.07s/it]

Getting Account Information: nehakakkar
Adding nehakakkar information...
Getting Posts Information: nehakakkar
Adding nehakakkar posts information...
Waiting 5 seconds...


 52%|█████▏    | 52/100 [06:40<05:36,  7.01s/it]

Getting Account Information: bts.bighitofficial
Adding bts.bighitofficial information...
Getting Posts Information: bts.bighitofficial
Adding bts.bighitofficial posts information...
Waiting 5 seconds...


 53%|█████▎    | 53/100 [06:47<05:23,  6.88s/it]

Getting Account Information: ronaldinho
Adding ronaldinho information...
Getting Posts Information: ronaldinho
Adding ronaldinho posts information...
Waiting 5 seconds...


 54%|█████▍    | 54/100 [06:54<05:17,  6.90s/it]

Getting Account Information: deepikapadukone
Adding deepikapadukone information...
Getting Posts Information: deepikapadukone
Adding deepikapadukone posts information...
Waiting 5 seconds...


 55%|█████▌    | 55/100 [07:00<05:08,  6.85s/it]

Getting Account Information: shawnmendes
Adding shawnmendes information...
Getting Posts Information: shawnmendes
Adding shawnmendes posts information...
Waiting 5 seconds...


 56%|█████▌    | 56/100 [07:07<05:03,  6.90s/it]

Getting Account Information: katrinakaif
Adding katrinakaif information...
Getting Posts Information: katrinakaif
Adding katrinakaif posts information...
Waiting 5 seconds...


 57%|█████▋    | 57/100 [07:14<04:55,  6.88s/it]

Getting Account Information: sooyaaa__
Adding sooyaaa__ information...
Getting Posts Information: sooyaaa__
Adding sooyaaa__ posts information...
Waiting 5 seconds...


 58%|█████▊    | 58/100 [07:21<04:51,  6.94s/it]

Getting Account Information: psg
Adding psg information...
Getting Posts Information: psg
Adding psg posts information...
Waiting 5 seconds...


 59%|█████▉    | 59/100 [07:29<04:54,  7.18s/it]

Getting Account Information: emmawatson
Adding emmawatson information...
Getting Posts Information: emmawatson
Adding emmawatson posts information...
Waiting 5 seconds...


 60%|██████    | 60/100 [07:35<04:38,  6.97s/it]

Getting Account Information: roses_are_rosie
Adding roses_are_rosie information...
Getting Posts Information: roses_are_rosie
Adding roses_are_rosie posts information...
Waiting 5 seconds...


 61%|██████    | 61/100 [07:43<04:35,  7.06s/it]

Getting Account Information: justintimberlake
Adding justintimberlake information...
Getting Posts Information: justintimberlake
Adding justintimberlake posts information...
Waiting 5 seconds...


 62%|██████▏   | 62/100 [07:50<04:32,  7.17s/it]

Getting Account Information: karimbenzema
Adding karimbenzema information...
Getting Posts Information: karimbenzema
Adding karimbenzema posts information...
Waiting 5 seconds...


 63%|██████▎   | 63/100 [07:57<04:18,  7.00s/it]

Getting Account Information: raffinagita1717
Adding raffinagita1717 information...
Getting Posts Information: raffinagita1717
Adding raffinagita1717 posts information...
Waiting 5 seconds...


 64%|██████▍   | 64/100 [08:05<04:26,  7.40s/it]

Getting Account Information: marvel
Adding marvel information...
Getting Posts Information: marvel
Adding marvel posts information...
Waiting 5 seconds...


 65%|██████▌   | 65/100 [08:13<04:20,  7.43s/it]

Getting Account Information: tomholland2013
Adding tomholland2013 information...
Getting Posts Information: tomholland2013
Adding tomholland2013 posts information...
Waiting 5 seconds...


 66%|██████▌   | 66/100 [08:19<04:04,  7.19s/it]

Getting Account Information: camila_cabello
Adding camila_cabello information...
Getting Posts Information: camila_cabello
Adding camila_cabello posts information...
Waiting 5 seconds...


 67%|██████▋   | 67/100 [08:26<03:58,  7.22s/it]

Getting Account Information: jacquelinef143
Adding jacquelinef143 information...
Getting Posts Information: jacquelinef143
Adding jacquelinef143 posts information...
Waiting 5 seconds...


 68%|██████▊   | 68/100 [08:34<03:49,  7.19s/it]

Getting Account Information: premierleague
Adding premierleague information...
Getting Posts Information: premierleague
Adding premierleague posts information...
Waiting 5 seconds...


 69%|██████▉   | 69/100 [08:42<03:52,  7.51s/it]

Getting Account Information: akshaykumar
Adding akshaykumar information...
Getting Posts Information: akshaykumar
Adding akshaykumar posts information...
Waiting 5 seconds...


 70%|███████   | 70/100 [08:50<03:49,  7.64s/it]

Getting Account Information: anitta
Adding anitta information...
Getting Posts Information: anitta
Adding anitta posts information...
Waiting 5 seconds...


 71%|███████   | 71/100 [08:58<03:44,  7.73s/it]

Getting Account Information: urvashirautela
Adding urvashirautela information...
Getting Posts Information: urvashirautela
Adding urvashirautela posts information...
Waiting 5 seconds...


 72%|███████▏  | 72/100 [09:05<03:31,  7.57s/it]

Getting Account Information: anushkasharma
Adding anushkasharma information...
Getting Posts Information: anushkasharma
Adding anushkasharma posts information...
Waiting 5 seconds...


 73%|███████▎  | 73/100 [09:12<03:22,  7.51s/it]

Getting Account Information: willsmith
Adding willsmith information...
Getting Posts Information: willsmith
Adding willsmith posts information...
Waiting 5 seconds...


 74%|███████▍  | 74/100 [09:20<03:14,  7.48s/it]

Getting Account Information: maluma
Adding maluma information...
Getting Posts Information: maluma
Adding maluma posts information...
Waiting 5 seconds...


 75%|███████▌  | 75/100 [09:27<03:02,  7.31s/it]

Getting Account Information: milliebobbybrown
Adding milliebobbybrown information...
Getting Posts Information: milliebobbybrown
Adding milliebobbybrown posts information...
Waiting 5 seconds...


 76%|███████▌  | 76/100 [09:33<02:51,  7.15s/it]

Getting Account Information: marcelotwelve
Adding marcelotwelve information...
Getting Posts Information: marcelotwelve
Adding marcelotwelve posts information...
Waiting 5 seconds...


 77%|███████▋  | 77/100 [09:40<02:42,  7.09s/it]

Getting Account Information: 433
Adding 433 information...
Getting Posts Information: 433
Adding 433 posts information...
Waiting 5 seconds...


 78%|███████▊  | 78/100 [09:49<02:43,  7.45s/it]

Getting Account Information: manchesterunited
Adding manchesterunited information...
Getting Posts Information: manchesterunited
Adding manchesterunited posts information...
Waiting 5 seconds...


 79%|███████▉  | 79/100 [09:58<02:50,  8.10s/it]

Getting Account Information: karolg
Adding karolg information...
Getting Posts Information: karolg
Adding karolg posts information...
Waiting 5 seconds...


 80%|████████  | 80/100 [10:05<02:36,  7.84s/it]

Getting Account Information: zacefron
Adding zacefron information...
Getting Posts Information: zacefron
Adding zacefron posts information...
Waiting 5 seconds...


 81%|████████  | 81/100 [10:12<02:22,  7.51s/it]

Getting Account Information: beingsalmankhan
Adding beingsalmankhan information...
Getting Posts Information: beingsalmankhan
Adding beingsalmankhan posts information...
Waiting 5 seconds...


 82%|████████▏ | 82/100 [10:19<02:10,  7.27s/it]

Getting Account Information: iamzlatanibrahimovic
Adding iamzlatanibrahimovic information...
Getting Posts Information: iamzlatanibrahimovic
Adding iamzlatanibrahimovic posts information...
Waiting 5 seconds...


 83%|████████▎ | 83/100 [10:26<02:00,  7.11s/it]

Getting Account Information: 9gag
Adding 9gag information...
Getting Posts Information: 9gag
Adding 9gag posts information...
Waiting 5 seconds...


 84%|████████▍ | 84/100 [10:35<02:02,  7.64s/it]

Getting Account Information: whinderssonnunes
Adding whinderssonnunes information...
Getting Posts Information: whinderssonnunes
Adding whinderssonnunes posts information...
Waiting 5 seconds...


 85%|████████▌ | 85/100 [10:41<01:51,  7.43s/it]

Getting Account Information: thv
Adding thv information...
Getting Posts Information: thv
Adding thv posts information...
Waiting 5 seconds...


 86%|████████▌ | 86/100 [10:48<01:42,  7.29s/it]

Getting Account Information: bellahadid
Adding bellahadid information...
Getting Posts Information: bellahadid
Adding bellahadid posts information...
Waiting 5 seconds...


 87%|████████▋ | 87/100 [10:56<01:34,  7.27s/it]

Getting Account Information: paulpogba
Adding paulpogba information...
Getting Posts Information: paulpogba
Adding paulpogba posts information...
Waiting 5 seconds...


 88%|████████▊ | 88/100 [11:03<01:25,  7.14s/it]

Getting Account Information: juventus
Adding juventus information...
Getting Posts Information: juventus
Adding juventus posts information...
Waiting 5 seconds...


 89%|████████▉ | 89/100 [11:11<01:23,  7.59s/it]

Getting Account Information: leonardodicaprio
Adding leonardodicaprio information...
Getting Posts Information: leonardodicaprio
Adding leonardodicaprio posts information...
Waiting 5 seconds...


 90%|█████████ | 90/100 [11:18<01:13,  7.39s/it]

Getting Account Information: dishapatani
Adding dishapatani information...
Getting Posts Information: dishapatani
Adding dishapatani posts information...
Waiting 5 seconds...


 91%|█████████ | 91/100 [11:25<01:05,  7.30s/it]

Getting Account Information: sergioramos
Adding sergioramos information...
Getting Posts Information: sergioramos
Adding sergioramos posts information...
Waiting 5 seconds...


 92%|█████████▏| 92/100 [11:33<00:59,  7.49s/it]

Getting Account Information: zara
Adding zara information...
Getting Posts Information: zara
Adding zara posts information...
Waiting 5 seconds...


 93%|█████████▎| 93/100 [11:40<00:51,  7.30s/it]

Getting Account Information: chrishemsworth
Adding chrishemsworth information...
Getting Posts Information: chrishemsworth
Adding chrishemsworth posts information...
Waiting 5 seconds...


 94%|█████████▍| 94/100 [11:47<00:43,  7.25s/it]

Getting Account Information: tatawerneck
Adding tatawerneck information...
Getting Posts Information: tatawerneck
Adding tatawerneck posts information...
Waiting 5 seconds...


 95%|█████████▌| 95/100 [11:54<00:35,  7.19s/it]

Getting Account Information: robertdowneyjr
Adding robertdowneyjr information...
Getting Posts Information: robertdowneyjr
Adding robertdowneyjr posts information...
Waiting 5 seconds...


 96%|█████████▌| 96/100 [12:01<00:28,  7.12s/it]

Getting Account Information: paulodybala
Adding paulodybala information...
Getting Posts Information: paulodybala
Adding paulodybala posts information...
Waiting 5 seconds...


 97%|█████████▋| 97/100 [12:09<00:21,  7.27s/it]

Getting Account Information: chanelofficial
Adding chanelofficial information...
Getting Posts Information: chanelofficial
Adding chanelofficial posts information...
Waiting 5 seconds...


 98%|█████████▊| 98/100 [12:16<00:14,  7.29s/it]

Getting Account Information: ladygaga
Adding ladygaga information...
Getting Posts Information: ladygaga
Adding ladygaga posts information...
Waiting 5 seconds...


 99%|█████████▉| 99/100 [12:23<00:07,  7.17s/it]

Getting Account Information: sunnyleone
Adding sunnyleone information...
Getting Posts Information: sunnyleone
Adding sunnyleone posts information...
Waiting 5 seconds...


100%|██████████| 100/100 [12:30<00:00,  7.51s/it]


In [15]:
session = {
            "csrf_token": csrf_token,
            "session_id": session_id
        }

headers = {
            "x-csrftoken": session['csrf_token'],
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36',
            "X-Requested-With": "XMLHttpRequest",
            "Referer": "https://www.instagram.com/accounts/login/",
            'Accept': '*/*',
            'Accept-Language': 'en-US,en;q=0.5',
            'X-Instagram-AJAX': 'c6412f1b1b7b',
            'X-IG-App-ID': '936619743392459',
            'X-ASBD-ID': '198387',
            'X-IG-WWW-Claim': '0',
            'X-Requested-With': 'XMLHttpRequest',
            'Origin': 'https://www.instagram.com',
            'DNT': '1',
            'Connection': 'keep-alive',
            'Referer': 'https://www.instagram.com/accounts/login/?',
            'Sec-Fetch-Dest': 'empty',
            'Sec-Fetch-Mode': 'cors',
            'Sec-Fetch-Site': 'same-origin',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.5',
            'Accept-Encoding': 'gzip, deflate, br',
            'Connection': 'keep-alive',
            'Upgrade-Insecure-Requests': '1',
            'Sec-Fetch-Dest': 'document',
            'Sec-Fetch-Mode': 'navigate',
            'Sec-Fetch-Site': 'none',
            'Sec-Fetch-User': '?1',
            'TE': 'trailers'
        }

cookies = {
            "sessionid": session['session_id'],
            "csrftoken": session['csrf_token']
        }
url = f'https://www.instagram.com/championsleague/?__a=1&__d=dis'
res = requests.get(url, headers=headers, cookies=cookies)
# add error handling here based on response codes, reference -> InstagramBot.py
res

<Response [400]>

In [7]:
data = res.json()

In [17]:
res.text

'{"message":"checkpoint_required","checkpoint_url":"https://www.instagram.com/challenge/?next=/championsleague/%253F__a%253D1%2526__d%253Ddis","lock":false,"flow_render_type":0,"status":"fail"}'

In [8]:
# this section need to be changed to sent requests and process the json which is in the response
f = open('Data/account_response.json', 'r')
data = json.load(f)
f.close()

In [8]:
followers = data['graphql']['user']['edge_followed_by']['count']
following = data['graphql']['user']['edge_follow']['count']
ar_effect = data['graphql']['user']['has_ar_effects']
id = data['graphql']['user']['id']
type_business = data['graphql']['user']['is_business_account']
type_professional = data['graphql']['user']['is_professional_account']
category = data['graphql']['user']['category_name']
verified = data['graphql']['user']['is_verified']
reel_count = data['graphql']['user']['edge_felix_video_timeline']['count']
media_count = data['graphql']['user']['edge_owner_to_timeline_media']['count']
username = data['graphql']['user']['username']

In [9]:
reel_view_list = []
reel_like_list = []
reel_comment_list = []
reel_duration_list = []
reel_timestamp_list = []

media_like_list = []
media_comment_list = []
media_timestamp_list = []

for video in data['graphql']['user']['edge_felix_video_timeline']['edges']:
    reel_view_list.append(video['node']['video_view_count'])
    reel_comment_list.append(video['node']['edge_media_to_comment']['count'])
    reel_timestamp_list.append(video['node']['taken_at_timestamp'])
    reel_like_list.append(video['node']['edge_liked_by']['count'])
    reel_duration_list.append(video['node']['video_duration'])

for medium in data['graphql']['user']['edge_owner_to_timeline_media']['edges']:
    media_comment_list.append(medium['node']['edge_media_to_comment']['count'])
    media_timestamp_list.append(medium['node']['taken_at_timestamp'])
    media_like_list.append(medium['node']['edge_liked_by']['count'])

reel_duration_list = [0 if duration is None else duration for duration in reel_duration_list]

In [12]:
reel_duration_list = [0 if duration is None else duration for duration in reel_duration_list]
reel_duration_list

[40.966, 8.133, 60.0, 254.8, 15.1, 0, 15.166, 205.5, 60.0, 23.1, 825.733, 42.8]

In [10]:
reel_utc_list = [datetime.utcfromtimestamp(ts) for ts in reel_timestamp_list]
media_utc_list = [datetime.utcfromtimestamp(ts) for ts in media_timestamp_list]

reel_utc_difference_list = [reel_utc_list[i] - reel_utc_list[i+1] for i in range(len(reel_utc_list) - 1)]
media_utc_difference_list = [media_utc_list[i] - media_utc_list[i+1] for i in range(len(media_utc_list) - 1)]

if reel_count > 1:
    reel_frequency = np.mean(reel_utc_difference_list).days + (np.mean(reel_utc_difference_list).seconds / 86_400) + (np.mean(reel_utc_difference_list).microseconds / 1_000_000 / 84_600)
else:
    reel_frequency = 0
media_frequency = np.mean(media_utc_difference_list).days + (np.mean(media_utc_difference_list).seconds / 86_400) + (np.mean(media_utc_difference_list).microseconds / 1_000_000 / 84_600)

reel_view_mean = np.mean(reel_view_list)
reel_like_mean = np.mean(reel_like_list)
reel_comment_mean = np.mean(reel_comment_list)
reel_duration_mean = np.mean(reel_duration_list)

media_like_mean = np.mean(media_like_list)
media_comment_mean = np.mean(media_comment_list)

entry_lst = [id, username, category, followers, following, ar_effect, type_business, type_professional, verified, reel_count, reel_view_mean, reel_comment_mean, reel_like_mean, reel_duration_mean, reel_frequency, media_count, media_comment_mean, media_like_mean, media_frequency]

accounts_df = pd.DataFrame([entry_lst] ,columns=['id', 'username', 'category_name', 'follower', 'following', 'ar_effect', 'type_business', 'type_professional', 'verified', 'reel_count', 'reel_avg_view', 'reel_avg_comment', 'reel_avg_like', 'reel_avg_duration', 'reel_frequency', 'media_count', 'media_avg_comment', 'media_avg_like', 'media_frequency'])

TypeError: unsupported operand type(s) for +: 'float' and 'NoneType'

In [13]:
accounts_df

Unnamed: 0,id,username,category_name,follower,following,ar_effect,type_business,type_professional,verified,reel_count,reel_avg_view,reel_avg_comment,reel_avg_like,reel_avg_duration,reel_frequency,media_count,media_avg_comment,media_avg_like,media_frequency
0,7719696,arianagrande,Musician,367888326,600,False,False,True,True,1309,9167998.0,40.5,2158336.25,49.31075,18.352435,4987,2367.666667,3577108.0,13.360044


In [32]:
def flatten(lst):
    """A helper function to flatten any dimensional python list to 1D one.

    Args:
        lst (list): multi dimension python list

    Returns:
        list: flattened list
    """
    rt = []
    for i in lst:
        if isinstance(i,list): rt.extend(flatten(i))
        else: rt.append(i)
    return rt

# main lists structure is:
#   shortcode, post_type, username, objects
posts_lst = []
for post in data['graphql']['user']['edge_owner_to_timeline_media']['edges']:
    temp_lst = []
    objects = []
    temp_lst.append(post['node']['shortcode'])
    temp_lst.append(post['node']['__typename'])
    temp_lst.append(data['graphql']['user']['username'])
    temp_lst.append(post['node']['edge_liked_by']['count'])
    temp_lst.append(post['node']['edge_media_to_comment']['count'])
    if post['node']['__typename'] == 'GraphImage' or post['node']['__typename'] == 'GraphSidecar':
        # split object-detection output
        if post['node']['accessibility_caption'] == None:
            objects = []
            continue
        objects = post['node']['accessibility_caption'].split('.')[1]
        # terminating empty lists
        if objects:
            try:
            # cutting objects
                objects = objects.split('of')[1]
                objects = objects.split('and', 1)
                objects[0] = objects[0].split(',')
                if 'text' in objects[1]:
                    objects[1] = 'text'
            except:
                continue
            # flattening the objects list to make the dimension 1D
            objects = flatten(objects)
            # terminating leading and trailing spaces from list items
            objects = [item.strip() for item in objects]
        else:
            objects = []
    # padding the objects list, we set the limit to 6 objects
    objects += ['No Object'] * (6 - len(objects))
    if len(objects) > 6:
        objects = objects[:6]
    temp_lst.append(objects)
    posts_lst.append(flatten(temp_lst))

# creating temporary dataframe for posts of this account
temp_df = pd.DataFrame(posts_lst, columns=[
    'shortcode',
    'post_type',
    'username',
    'like',
    'comment',
    'object_1',
    'object_2',
    'object_3',
    'object_4',
    'object_5',
    'object_6'
])

In [33]:
temp_df

Unnamed: 0,shortcode,post_type,username,like,comment,object_1,object_2,object_3,object_4,object_5,object_6
0,CqLUeUppDYs,GraphSidecar,beyonce,3180843,25102,2 people,makeup,people kissing,suit,overcoat,dinner jacket
1,CqLUQ_upwT0,GraphImage,beyonce,1539359,18315,1 person,magazine,text,No Object,No Object,No Object
2,Cp3vzkhp1Ug,GraphSidecar,beyonce,4515558,52969,1 person,makeup,dress,No Object,No Object,No Object
3,CoZ98CJDHRZ,GraphVideo,beyonce,4753985,88044,No Object,No Object,No Object,No Object,No Object,No Object
4,CoZJBmpuIlD,GraphSidecar,beyonce,3586900,27435,miniskirt,drawstring,top,No Object,No Object,No Object
5,CoTojVarwg_,GraphSidecar,beyonce,2860019,31583,one or more people,makeup,No Object,No Object,No Object,No Object
6,CoRHCpauzUG,GraphImage,beyonce,2144468,22590,one or more people,makeup,dress,miniskirt,No Object,No Object
7,CoHxOQhrTHX,GraphImage,beyonce,8670783,225761,costume,tinfoil,fishnet stockings,headdress,No Object,No Object
8,CkhUy2bL0_9,GraphImage,beyonce,5640978,76182,No Object,No Object,No Object,No Object,No Object,No Object
