## Introduction

This project applies natural language processing techniques to analyze a dataset of over 1000 comments from Chris Bumsted's Instagram page, collected using Apify. The comments were extracted and saved into a CSV file for further analysis.

The goal is to gain insights into the topics, sentiment, and linguistic patterns within this social media content. By implementing NLP algorithms for text analysis, we can extract meaningful information from unstructured text data.

Specifically, this project will clean and preprocess the Instagram comments, then apply techniques like sentiment analysis, topic modeling, and entity extraction. Visualizations will also be generated to summarize the key findings.

This enables both a broad overview and a detailed exploration of the language used by Bumsted's followers on Instagram. The techniques applied serve as a practical demonstration of how NLP can be used to extract value from social data.

## Import Libraries

In [1]:
from apify_client import ApifyClient
from dotenv import load_dotenv
import requests
import os
import pandas as pd

## Implementation

In [2]:
load_dotenv()

True

Read apify api token from env file

In [3]:
api_token = os.getenv("API_TOKEN")

init apify api 

In [4]:
apify_client = ApifyClient(api_token)

In [5]:
run_input = {
    "directUrls": [
        "https://www.instagram.com/p/CwBOUoZSDrf/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/reel/Cvxj4Rcga_i/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/reel/Cuw7en7Akhg/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/CuKYRiqgMh4/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/CtXCEXDgtgl/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/reel/C1hw790AJsR/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/C1U21__gmP9/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/C0e6xcLAPpC/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/C0HUdmGt66Q/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/Cz67N84Pezn/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/C2KbLaHADNM/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/reel/C2Ih9JxgSLD/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/reel/C2C3r_rgPu1/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/reel/C17pLDogJUV/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/C1z5hg5g22P/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/C1sf3zAAH-j/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/C1M3Co8gzcz/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/C1FeQ1dAXxB/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/C066yhrARUe/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/C04SuDAA4nB/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/reel/C0hXiXyAYU9/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/reel/C0W_xgMABGm/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/reel/CzpJZlcgb37/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/reel/Czg2lSPgslD/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/CzeOV8Kgb57/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/CzTrjXSA4FD/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/CzRy6VMgZyk/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/CzQ-mWiACOF/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/CzPED9Fg_GX/?utm_source=ig_web_copy_link",
        "https://www.instagram.com/p/CzL83b3AIsu/?utm_source=ig_web_copy_link",
    ]
}

Start an actor and wait for it to finish

In [6]:
run = apify_client.actor("SbK00X0JYCPblD2wp").call(run_input=run_input)

generate dataframe from data

In [7]:
data = []
for item in apify_client.dataset(run["defaultDatasetId"]).iterate_items():
    data.append(item)

In [8]:
data = pd.DataFrame(data)


remove duplicate comments

In [9]:
data = data.drop_duplicates(subset=["text"])

In [10]:
data.shape

(1119, 7)

save the collected data to data.csv

In [11]:
data.to_csv("../data.csv", index=False)