<img width="8%" alt="Instagram.png" src="https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/.github/assets/logos/Instagram.png" style="border-radius: 15%">

# Instagram - Extract details from account
<a href="https://bit.ly/3JyWIk6">Give Feedback</a> | <a href="https://github.com/jupyter-naas/awesome-notebooks/issues/new?assignees=&labels=bug&template=bug_report.md&title=Instagram+-+Post+image+and+caption:+Error+short+description">Bug report</a>

**Tags:** #instagram #snippet #content

**Author:** [Varsha Kumar](https://www.linkedin.com/in/varsha-kumar-590466305/)

**Last update:** 2024-07-09 (Created: 2024-07-04)

**Description:** This notebook allows users to extract details from an Instagram account.

### How to retrive API key with apify

1. Go to https://apify.com.
2. Click "Sign up for free" and use your google account to sign up.
3. Once your account has been created, navigate to "Settings" on the left panel of the screen.
4. Here you will click on the tab labeled "Integrations" where your personal API token that was automatically generated with sign up will be.
5. Copy that token and use it to extract data!

## Input

### Import libraries

In [1]:
import requests
import pandas as pd
import json

### Setup variables
- `apify_token`: personal token apify creates to access data
- `instagram_profile_url`: link to the instagram profile

In [2]:
apify_token = "apify_api_gXWnLEPiE7wC8ALUwQkJ0QcdbuQzU8xxxxxx"
instagram_profile_url = "https://www.instagram.com/naaslife/"
output_csv1 = f"{instagram_profile_url.split('https://www.instagram.com/')[1].replace('/', '_')}instagram_account.csv"
output_csv2 = f"{instagram_profile_url.split('https://www.instagram.com/')[1].replace('/', '_')}instagram_posts.csv"

## Model

### Scrape instagram data

In [3]:
def get_instagram_data(apify_token, instagram_profile_url):
    # Extract the username from the profile URL
    username = instagram_profile_url.split('/')[-2]
    
    # Define the Apify API URL for the Instagram Profile Scraper
    api_url = "https://api.apify.com/v2/acts/apify~instagram-profile-scraper/run-sync-get-dataset-items"

    # Define the payload with the necessary parameters
    payload = {
        "usernames": [username],  # Pass the username as a list
        "proxyConfig": {
            "useApifyProxy": True
        }
    }

    # Define the headers with the Apify API token
    headers = {
        "Authorization": f"Bearer {apify_token}",
        "Content-Type": "application/json"
    }

    # Make the request to the Apify API
    response = requests.post(api_url, json=payload, headers=headers)

    # Extract the JSON data from the response
    data = response.json()
    
    return data

# Make posts dataframe structure
def get_posts(
    ownerUsername,
    ownerId,
    pid,
    post_type,
    caption,
    hashtags,
    mentions,
    url,
    comments_count,
    likes_count,
    timestamp
):
    return {
        "OWNER_USERNAME": ownerUsername,
        "OWNER_ID": ownerId,
        "ID": pid,
        "POST_TYPE": post_type,
        "CAPTION": caption,
        "HASHTAGS": hashtags,
        "MENTIONS": mentions,
        "URL": url,
        "COMMENTS_COUNT": comments_count,
        "LIKES_COUNT": likes_count,
        "TIMESTAMP": timestamp
    }
    
profile_data = get_instagram_data(apify_token, instagram_profile_url)

## Output

### Account dataframe

In [4]:
data1 = []

data1 = [{
        "ID": profile_data[0]['id'],
        "USERNAME": profile_data[0]['username'],
        "URL": profile_data[0]['url'],
        "BIO": profile_data[0]['biography'],
        "FOLLOWERS": profile_data[0]['followersCount'],
        "FOLLOWING": profile_data[0]['followsCount'],
        "PRIVATE": profile_data[0]['private'],
        "POST_COUNT": profile_data[0]['postsCount']
}]

df1 = pd.DataFrame(data1)
df1

Unnamed: 0,ID,USERNAME,URL,BIO,FOLLOWERS,FOLLOWING,PRIVATE,POST_COUNT
0,49645556825,naaslife,https://www.instagram.com/naaslife,"Unlocking the power of data, automation, and A...",78,102,False,17


### Save first dataframe to csv

In [5]:
df1.to_csv(output_csv1, index=False)

### Posts dataframe

In [6]:
data2 = []

for post in profile_data[0]['latestPosts']:
    data_post = get_posts(
            post["ownerUsername"],
            post["ownerId"],
            post["id"],
            post["type"],
            post["caption"],
            post["hashtags"],
            post["mentions"],
            post["url"],
            post["commentsCount"],
            post["likesCount"],
            post["timestamp"]
        )
    data2.append(data_post)
        
df2 = pd.DataFrame(data2)
df2

Unnamed: 0,OWNER_USERNAME,OWNER_ID,ID,POST_TYPE,CAPTION,HASHTAGS,MENTIONS,URL,COMMENTS_COUNT,LIKES_COUNT,TIMESTAMP
0,naaslife,49645556825,3278939566399457755,Image,The real magic happens when you merge this AI ...,"[data, ai]",[],https://www.instagram.com/p/C2BIim_Nonb/,0,0,2024-01-12T22:54:37.000Z
1,naaslife,49645556825,3157000429711370041,Image,"The Lean Data Journal, article #1 is out! \n\n...","[1, data, automation, ai, datascience, artific...",[],https://www.instagram.com/p/CvP6yoQs6c5/,1,3,2023-07-28T17:03:20.000Z
2,naaslife,49645556825,3023800274288630756,Image,"In the desert sands, a pipeline flows\nOil rus...",[],[],https://www.instagram.com/p/Cn2slQks9vk/,0,2,2023-01-25T22:18:24.000Z
3,naaslife,49645556825,3023165800621664596,Image,"A pipeline of water in the jungle,\nA sight bo...","[AvatarWorld, Nature, ManVsWild, JungleLife, W...",[],https://www.instagram.com/p/Cn0cUc7KelU/,4,4,2023-01-25T01:17:49.000Z
4,naaslife,49645556825,3023157023520418465,Image,"Every data pipeline is different,\n\nand that'...",[],[],https://www.instagram.com/p/Cn0aUunKWqh/,0,2,2023-01-25T01:00:22.000Z
5,naaslife,49645556825,2961429883488333870,Image,🎃 Happy Halloween to all the Pandas coders who...,"[halloween, candy, python, coders, jupyternote...",[],https://www.instagram.com/p/CkZHMnojHAu/,0,8,2022-10-31T20:59:34.000Z
6,naaslife,49645556825,2923439672380916997,Image,Are you interested in data & open source ?\n\n...,"[opensource, data, ai, automation, analytics, ...",[],https://www.instagram.com/p/CiSJOSajIUF/,0,5,2022-09-09T10:59:48.000Z
7,naaslife,49645556825,2916247470160561289,Image,Back from a break in the mountains and ready t...,[],[],https://www.instagram.com/p/Ch4l5-IDUSJ/,0,3,2022-08-30T12:50:10.000Z
8,naaslife,49645556825,2884235236414899245,Image,📊⭐️ Do you want to build dashboards and data a...,[],[],https://www.instagram.com/p/CgG3KqLLMQt/,0,4,2022-07-17T08:47:35.000Z
9,naaslife,49645556825,2861894946727707225,Image,Wondering how to read a dataframe from your fi...,"[aws, cloud, storage, S3bucket, operations, sn...",[awscloud],https://www.instagram.com/p/Ce3fkqEMMJZ/,0,4,2022-06-16T13:01:25.000Z


### Save second dataframe to csv

In [7]:
df2.to_csv(output_csv2, index=False)