In [1]:
pip install instaloader

Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install --upgrade pip

Collecting pip
  Obtaining dependency information for pip from https://files.pythonhosted.org/packages/47/6a/453160888fab7c6a432a6e25f8afe6256d0d9f2cbd25971021da6491d899/pip-23.3.1-py3-none-any.whl.metadata
  Using cached pip-23.3.1-py3-none-any.whl.metadata (3.5 kB)
Using cached pip-23.3.1-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.2.1
    Uninstalling pip-23.2.1:
      Successfully uninstalled pip-23.2.1
Successfully installed pip-23.3.1
Note: you may need to restart the kernel to use updated packages.


In [4]:
pip install instaloader --upgrade

Note: you may need to restart the kernel to use updated packages.


# Task 1-1 

Given an instagram profile url, for example https://www.instagram.com/aashnashroff, using Vision / NLP algorithms and ( or ) heuristics, classify if the account belongs to an individual user or a brand / organization.

## Instaloader


`fetch_instagram_data(username)` retrieves data from an Instagram user's profile by utilizing the Instaloader library. After initializing an instance of Instaloader, it obtains the Instagram user's profile data with the provided {username}. The `profile` variable is used to fetch and store the user's profile information, which includes information about their posts, followers, and following. This {profile} object, which may be used to access and examine the user's Instagram data in further detail, is what the function returns.

In [2]:
import instaloader

def fetch_instagram_data(username):
    L = instaloader.Instaloader()
    profile = instaloader.Profile.from_username(L.context, username)
    return profile

## Analyzing Profile Picture for Faces

To search for faces in an Instagram user's profile image, use the function `analyze_profile_picture(profile)`. Using the `get_profile_pic_url()` method, the function first obtains the URL of the user's profile photo from the `profile} object. After that, it sends an HTTP request to retrieve the picture data, and it determines whether the request was successful if the response status code is 200. If successful, the picture is decoded using OpenCV, grayscaled, and then a Haar Cascade classifier is used to recognize faces. {faces_detected} is the number of faces found in the profile photo that has been tallied. If there are any errors, the function returns 0 and prints the appropriate error messages. Additionally, there's a comment indicating that you can similarly add code to detect logos if needed, but that part is not implemented in the provided code.

In [7]:
import cv2
import requests
import numpy as np

def analyze_profile_picture(profile):
    img_url = profile.get_profile_pic_url()

    try:
        response = requests.get(img_url)
        if response.status_code == 200:
            img_data = np.frombuffer(response.content, np.uint8)
            img = cv2.imdecode(img_data, cv2.IMREAD_COLOR)

            if img is not None:
                face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
                gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
                faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30))
                faces_detected = len(faces)

                # You can similarly add code to detect logos if needed.

                return faces_detected
            else:
                print("Failed to decode the image.")
                return 0
        else:
            print(f"Failed to fetch the image from URL: {img_url}")
            return 0
    except Exception as e:
        print(f"An error occurred while processing the image: {str(e)}")
        return 0


##  Analyzing Text for Named Entities

using the Hugging Face Transformers library, Named Entity Recognition (NER) may be carried out on a text input. Hugging Face's Transformers library's pre-trained NER (Named Entity Recognition) pipeline is used by the method `analyze_nlp_text(text)`, which accepts a text as input. The NER pipeline is started, the input text is processed, and named entities—such as names of individuals, groups, places, and more—are extracted from the text. The function returns the named entities that have been identified and are kept in the `keywords` variable. Using a previously trained NER model, this code streamlines the process of extracting entities from text.

In [11]:
from transformers import pipeline

def analyze_nlp_text(text):
    nlp = pipeline("ner")
    keywords = nlp(text)
    return keywords


## Account Classification Based on Profile Data

The input for `classify_account(profile)} is the profile of an Instagram user. In order to find faces in the user's profile image, it first runs the `analyze_profile_picture(profile)` function. The count of faces found is then assigned to the variable `face_score}. Next, it utilizes the `analyze_nlp_text(profile.biography)` method to retrieve named entities from the user's biography, and it allocates the `keyword_score` variable to the number of retrieved keywords.  

The function then determines if {keyword_score} and {face_score} are both greater than 0. The function labels the account as "Individual" and returns this label if it detects faces in the profile picture and names entities in the biography. Otherwise, the function labels the account as "Brand/Organization" and returns this label if neither the biography nor the profile picture contain any faces. In order to determine the type of account, the classification relies on the presence of textual information (named entities in the biography) and personal attributes (faces in the profile picture).

In [12]:
def classify_account(profile):
    faces_detected = analyze_profile_picture(profile)
    keywords = analyze_nlp_text(profile.biography)

    face_score = faces_detected  # Using the detected face count directly
    keyword_score = len(keywords)

    if face_score > 0 and keyword_score > 0:
        return "Individual"
    else:
        return "Brand/Organization"

Obtains Instagram information for the user "aashnashroff" and categorizes the account as "Individual" or "Brand/Organization" depending on whether the profile picture has faces and the biography contains named entities. Ultimately, the classification is printed out. This code will retrieve the information for the Instagram user "aashnashroff" and classify their account using the previously described criteria, assuming that the necessary functions (`fetch_instagram_data`, `classify_account}, and the other related functions) are correctly implemented. The following will be printed as the outcome:

```
The account 'aashnashroff' is classified as: Individual
```

The classification depends on the specific content of the user's profile picture and biography, and the code determines whether it's an "Individual" or a "Brand/Organization" based on the presence of faces and named entities.

In [49]:
username = "aashnashroff"
profile = fetch_instagram_data(username)
classification = classify_account(profile)
print(f"The account '{username}' is classified as: {classification}")


No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


The account 'aashnashroff' is classified as: Individual
