# Discovery das APIs do Instagram a partir do RapidApi

## Nós buscamos as seguintes informações para contas próprias e contas dos concorrentes


- Número de seguidores
  - Número total de seguidores para a conta
  - Data da busca 
- Publicações
  - Número total de publicações
  - IDs das publicações
  - Descrição das publicações
  - Foto/Link para a publicação
  - Data da busca
- Engajamento das publicações 
  - Definir quantas publicações são de cada tipo (Posts, Reels, Stories)
  - Para cada tipo de publicação econtrar:
    - Like
    - Comentários
    - Share

Obs:  
    
    Aparentemente o instagram limita para que todas APIs retornem apenas 12 posts por vez.
 
    Para buscar todos os posts precisaremos fazer inúmeras requisições.
 
    A partir das requisições teremos boas informações.

# Buscamos dados das seguintes marcas:

## Próprias: @
- mmartanoficial (MMartan)
- santistadecora (Santista)
- artex (Artex)

## Concorrentes: @
- artelasse (Artelasse)
- camicado (Camicado)
- casaalmeidaoficial (Casa Almeida)
- casariachuelo (Casa Riachuelo)
- karstenoficial (Karsten)
- mundodoenxoval (Mundo do Enxoval)
- trussardioficial (Trussardi)
- zeloloja (Zelo)

## APIs e endpoints:

Para buscar todos os dados talvez seja necessário o uso de mais de uma API para manter o plano gratuito.

Podemos modularizar os serviços para que usem serviços e endpoints diferentes.

Algumas APIs do RapidApi possuem retornos parecidos para endpoints de listagem de usuário ou postagens (aparentemente o limite para retornar apenas 12 postagens por requisição é padrão).

- API ATUAL https://rapidapi.com/social-api1-instagram/api/instagram-scraper-api2/playground/apiendpoint_b1301387-dc09-4b1f-ba39-b7b51d186b40

*Possíveis alternativas* 

- https://rapidapi.com/arraybobo/api/instagram-scraper-2022/playground/apiendpoint_4cede182-2b1d-4ade-a4c1-fc4c3577a01f
- https://rapidapi.com/mrngstar/api/instagram-api-20231/playground/apiendpoint_5113fc28-2703-4566-845d-3378a1f96bc7
- https://rapidapi.com/Instagapicom/api/instagram-scraper-20231/playground/apiendpoint_8a6bb604-8ced-4947-8560-50c221779c08
- 

## Objetos (DTO)

- DTOs para as contas e postagens

In [40]:
from dataclasses import dataclass
from typing import List, Optional, Union
import datetime

# ---------------------------------------------------------------------------------
@dataclass
class InstagramCaptionInfo:
    text: str
    created_at_utc: Union[datetime.date, str, None]
# ---------------------------------------------------------------------------------
    
# ---------------------------------------------------------------------------------
@dataclass
class InstagramMediaInfo:
    instagram_code: str # api -> code
    caption: InstagramCaptionInfo
    comment_count: Union[int, str]
    like_count: Union[int, str]
    media_name: str
    share_count: Union[int, str]
    taken_at: Union[datetime.date, str, None]
    is_video: Optional[bool] = False
    play_count: Optional[Union[int, str]] = None
    carousel_media_count: Optional[Union[int, str]] = None
# ---------------------------------------------------------------------------------

# ---------------------------------------------------------------------------------
@dataclass
class InstagramAccountInfo:
    name: str # api -> full_name
    username: str
    follower_count: Union[int, str]
    total_media: Union[int, str]
    profile_pic_url: str
    last_update: Union[datetime.date, str, None]
    instagram_user_id: Union[int, str] # api -> id
    media_info: Optional[List[InstagramMediaInfo]]
# ---------------------------------------------------------------------------------

- DTOs para as requisições

In [41]:
from dataclasses import dataclass
from typing import Dict, Optional

@dataclass
class IRequestInstagramParams:
    username: str
    base_url: str
    users_url: str
    posts_url: str
    x_rapidapi_key: str
    x_rapidapi_host: str
    user_querystring: str
    media_querystring: str
    pagination_token: Optional[str] = None


## Código úteis para modularização

- Services Enum
- Usuários Enum

In [44]:
from typing import Literal

ServiceNames = Literal[
        "instagram_scrapper_api"
    ]

TrackedUsers = Literal[
    "altenburg.oficial",
    "altenburghaus",
    "artelasse",
    "artex",
    "buddemeyeroficial",
    "casaalmeidaoficial",
    "casariachuelo",
    "casa.sonno",
    "hoomybr",
    "karstenoficial",
    "mmartanoficial",
    "santistadecora",
    "trussardioficial",
    "trousseauoficial",
    "zeloloja"
]

list_of_tracked_users = [
    "altenburg.oficial",
    "altenburghaus",
    "artelasse",
    "artex",
    "buddemeyeroficial",
    "casaalmeidaoficial",
    "casariachuelo",
    "casa.sonno",
    "hoomybr",
    "karstenoficial",
    "mmartanoficial",
    "santistadecora",
    "trussardioficial",
    "trousseauoficial",
    "zeloloja"
]

- Parametros necesários para a requisição

In [11]:
def get_service_params(service_name: ServiceNames, username: TrackedUsers) -> IRequestInstagramParams:
    services_availables = {
        "instagram_scrapper_api": IRequestInstagramParams(
            username=username,
            base_url="https://instagram-scraper-api2.p.rapidapi.com/",
            users_url="v1/info",
            posts_url="v1.2/posts",
            x_rapidapi_host="instagram-scraper-api2.p.rapidapi.com",
            x_rapidapi_key="6f36805577msh5e42867c3bd4692p12525ajsn80e3751c2d32",
            user_querystring="username_or_id_or_url",
            media_querystring="pagination_token",
            pagination_token=None
        )
    }
    
    return services_availables[service_name]

- Formatação dos parametros de requisição

In [14]:
def format_params_to_headers(params: IRequestInstagramParams) -> Dict[str, str]:
    return {
        "x-rapidapi-host": params.x_rapidapi_host,
        "x-rapidapi-key": params.x_rapidapi_key
    }

def format_user_querystring(params: IRequestInstagramParams) -> Dict[str, str]:
    return {
        params.user_querystring: params.username
    }

def format_media_querystring(params: IRequestInstagramParams) -> Dict[str, str]:
    return {
        params.user_querystring: params.username,
        params.media_querystring: params.pagination_token
    }

def format_users_url(params: IRequestInstagramParams) -> str:
    return params.base_url + params.users_url

def format_media_url(params: IRequestInstagramParams) -> str:
    return params.base_url + params.posts_url

In [15]:
service_params = get_service_params(service_name="instagram_scrapper_api", username="mmartanoficial")

headers = format_params_to_headers(service_params)

querystring_users = format_user_querystring(service_params)

querystring_media = format_media_querystring(service_params)

users_url = format_users_url(service_params)

media_url = format_media_url(service_params)

In [45]:
import requests
response = requests.get(users_url, headers=headers, params=querystring_users)

In [47]:
response.json()

{'data': {'about': None,
  'account_badges': [],
  'account_type': 2,
  'active_standalone_fundraisers': {'fundraisers': [], 'total_count': 0},
  'adjusted_banners_order': [],
  'ads_incentive_expiration_date': None,
  'ads_page_id': 224527784251940,
  'ads_page_name': 'mmartan',
  'bio_links': [{'icon_url': '',
    'image_url': '',
    'is_pinned': False,
    'is_verified': False,
    'link_id': 17989304845577369,
    'link_type': 'external',
    'lynx_url': 'https://l.instagram.com/?u=https%3A%2F%2Flinktr.ee%2Fmmartanoficial%3Ffbclid%3DPAZXh0bgNhZW0CMTEAAabjqFFo99GBD_iWecsqXGRzu_27GkGjJBXLgBu6Jxv1U3ummP1a3Lescgc_aem_StzwIofrKCd-q0KOEIop4Q&e=AT04wYtJCEoZKOIx58H2m3fTTOq09pX2k4bUh1K9dbtiWdM72aQXPkaGpeTjFwWRWk9iBmjxA5Wq5ScOrfbZK0oeYMw8tsXXpjvS8mI',
    'open_external_url_with_in_app_browser': True,
    'title': '',
    'url': 'https://linktr.ee/mmartanoficial'}],
  'biography': 'Desde 1980 criando histórias com a sua casa, seja no toque macio dos lençóis, banho aconchegante ou aromas inc

## Mock das respostas da API

- API = Instagram Scrapper API [Target](https://rapidapi.com/social-api1-instagram/api/instagram-scraper-api2/playground/apiendpoint_b1301387-dc09-4b1f-ba39-b7b51d186b40)

In [107]:
import json

def save_response_to_json(username: TrackedUsers, response):
    with open(username + '.json', 'a', encoding='utf-8') as f:
        json.dump(response, f, ensure_ascii=False, indent=4)

def load_response_from_json(username: TrackedUsers):
    with open(username + '.json', 'r', encoding='utf-8') as f:
        return json.load(f)


In [48]:
response.json()

{'data': {'about': None,
  'account_badges': [],
  'account_type': 2,
  'active_standalone_fundraisers': {'fundraisers': [], 'total_count': 0},
  'adjusted_banners_order': [],
  'ads_incentive_expiration_date': None,
  'ads_page_id': 224527784251940,
  'ads_page_name': 'mmartan',
  'bio_links': [{'icon_url': '',
    'image_url': '',
    'is_pinned': False,
    'is_verified': False,
    'link_id': 17989304845577369,
    'link_type': 'external',
    'lynx_url': 'https://l.instagram.com/?u=https%3A%2F%2Flinktr.ee%2Fmmartanoficial%3Ffbclid%3DPAZXh0bgNhZW0CMTEAAabjqFFo99GBD_iWecsqXGRzu_27GkGjJBXLgBu6Jxv1U3ummP1a3Lescgc_aem_StzwIofrKCd-q0KOEIop4Q&e=AT04wYtJCEoZKOIx58H2m3fTTOq09pX2k4bUh1K9dbtiWdM72aQXPkaGpeTjFwWRWk9iBmjxA5Wq5ScOrfbZK0oeYMw8tsXXpjvS8mI',
    'open_external_url_with_in_app_browser': True,
    'title': '',
    'url': 'https://linktr.ee/mmartanoficial'}],
  'biography': 'Desde 1980 criando histórias com a sua casa, seja no toque macio dos lençóis, banho aconchegante ou aromas inc

## Parsing das respostas da API para objetos

- Retirando apenas as informações necessárias a partir do json

In [21]:
mmartan_service_params = get_service_params(service_name="instagram_scrapper_api", username="mmartanoficial")

mmartan_response = load_response_from_json(mmartan_service_params)

In [128]:
import datetime

def parse_account_info(response):
    data = response['data']
    account_info = InstagramAccountInfo(
        name=data['full_name'],
        username=data['username'],
        follower_count=data['follower_count'],
        total_media=data['media_count'],
        last_update=datetime.datetime.now().isoformat(),
    )
    
    return account_info

In [129]:
info = parse_account_info(mmartan_response)
info

TypeError: 'Response' object is not subscriptable

In [53]:
import pandas as pd
import json

df = pd.read_json(json.dumps(mmartan_response), orient='index')
# response.json()
df

  df = pd.read_json(json.dumps(mmartan_response), orient='index')


Unnamed: 0,about,account_badges,account_category,account_type,active_standalone_fundraisers,additional_business_addresses,adjusted_banners_order,ads_incentive_expiration_date,ads_page_id,ads_page_name,...,show_shoppable_feed,spam_follower_setting_enabled,text_app_last_visited_time,third_party_downloads_enabled,total_ar_effects,total_igtv_videos,transparency_product_enabled,upcoming_events,username,whatsapp_number
data,,[],,2,"{'fundraisers': [], 'total_count': 0}",[],[],,224527784251940,mmartan,...,True,True,NaT,1,0,25,False,[],mmartanoficial,


In [55]:
data = {
    "name": [info.name],
    "username": [info.username],
    "follower_count": [info.follower_count],
    "total_media": [info.total_media],
    "profile_pic_url": [info.profile_pic_url],
    "last_update": [info.last_update],
    "instagram_user_id": [info.instagram_user_id],
    "media_info": [info.media_info]
}
df = pd.DataFrame(data)
df

Unnamed: 0,name,username,follower_count,total_media,profile_pic_url,last_update,instagram_user_id,media_info
0,mmartan,mmartanoficial,1172326,4618,https://scontent-waw2-1.cdninstagram.com/v/t51...,2024-10-01T10:32:01.462155,321202794,


In [56]:
data = {
    "name": [],
    "username": [],
    "follower_count": [],
    "total_media": [],
    "profile_pic_url": [],
    "last_update": [],
    "instagram_user_id": [],
    "media_info": []
}
df_tracked_users = pd.DataFrame(data)
df_tracked_users

Unnamed: 0,name,username,follower_count,total_media,profile_pic_url,last_update,instagram_user_id,media_info


## Buscando dados e salvando

- primeiro vamos buscar todos os dados das marcas e salvar como json

In [75]:
import requests
import pandas as pd
import datetime

In [76]:
artelasse_service_params = get_service_params(service_name="instagram_scrapper_api", username="artelasse")
artex_service_params = get_service_params(service_name="instagram_scrapper_api", username="artex")
camicado_service_params = get_service_params(service_name="instagram_scrapper_api", username="camicado")
casa_almeida_service_params = get_service_params(service_name="instagram_scrapper_api", username="casaalmeidaoficial")
casa_riachuelo_service_params = get_service_params(service_name="instagram_scrapper_api", username="casariachuelo")
karsten_service_params = get_service_params(service_name="instagram_scrapper_api", username="karstenoficial")
mmartan_service_params = get_service_params(service_name="instagram_scrapper_api", username="mmartanoficial")
mundo_enxoval_service_params = get_service_params(service_name="instagram_scrapper_api", username="mundodoenxoval")
santista_service_params = get_service_params(service_name="instagram_scrapper_api", username="santistadecora")
trussardi_service_params = get_service_params(service_name="instagram_scrapper_api", username="trussardioficial")
zelo_service_params = get_service_params(service_name="instagram_scrapper_api", username="zeloloja")

In [61]:
headers = format_params_to_headers(artelasse_service_params)

In [91]:
artelasse_querystring_users = format_user_querystring(artelasse_service_params)
artelasse_querystring_media = format_media_querystring(artelasse_service_params)
artelasse_users_url = format_users_url(artelasse_service_params)
artelasse_media_url = format_media_url(artelasse_service_params)

artelasse_response = requests.get(artelasse_users_url, headers=headers, params=artelasse_querystring_users)

save_response_to_json(username="artelasse", response=artelasse_response.json())

TypeError: Object of type Response is not JSON serializable

In [93]:
artex_querystring_users = format_user_querystring(artex_service_params)
artex_querystring_media = format_media_querystring(artex_service_params)
artex_users_url = format_users_url(artex_service_params)
artex_media_url = format_media_url(artex_service_params)

artex_response = requests.get(artex_users_url, headers=headers, params=artex_querystring_users)

save_response_to_json(username="artex", response=artex_response.json())

In [94]:
camicado_querystring_users = format_user_querystring(camicado_service_params)
camicado_querystring_media = format_media_querystring(camicado_service_params)
camicado_users_url = format_users_url(camicado_service_params)
camicado_media_url = format_media_url(camicado_service_params)

camicado_response = requests.get(camicado_users_url, headers=headers, params=camicado_querystring_users)

save_response_to_json(username="camicado", response=camicado_response.json())

In [99]:
casa_almeida_querystring_media = format_media_querystring(casa_almeida_service_params)
casa_almeida_media_url = format_media_url(casa_almeida_service_params)
casa_almeida_querystring_users = format_user_querystring(casa_almeida_service_params)
casa_almeida_users_url = format_users_url(casa_almeida_service_params)

casa_almeida_response = requests.get(casa_almeida_users_url, headers=headers, params=casa_almeida_querystring_users)

save_response_to_json(username="casaalmeidaoficial", response=casa_almeida_response.json())

In [96]:
casa_riachuelo_querystring_users = format_user_querystring(casa_riachuelo_service_params)
casa_riachuelo_querystring_media = format_media_querystring(casa_riachuelo_service_params)
casa_riachuelo_users_url = format_users_url(casa_riachuelo_service_params)
casa_riachuelo_media_url = format_media_url(casa_riachuelo_service_params)

casa_riachuelo_response = requests.get(casa_riachuelo_users_url, headers=headers, params=casa_riachuelo_querystring_users)

save_response_to_json(username="casariachuelo", response=casa_riachuelo_response.json())

In [100]:
karsten_querystring_users = format_user_querystring(karsten_service_params)
karsten_querystring_media = format_media_querystring(karsten_service_params)
karsten_users_url = format_users_url(karsten_service_params)
karsten_media_url = format_media_url(karsten_service_params)

karsten_response = requests.get(karsten_users_url, headers=headers, params=karsten_querystring_users)

save_response_to_json(username="karstenoficial", response=karsten_response.json())

In [101]:
mmartan_querystring_users = format_user_querystring(mmartan_service_params)
mmartan_querystring_media = format_media_querystring(mmartan_service_params)
mmartan_users_url = format_users_url(mmartan_service_params)
mmartan_media_url = format_media_url(mmartan_service_params)

mmartan_response = requests.get(mmartan_users_url, headers=headers, params=mmartan_querystring_users)

save_response_to_json(username="mmartanoficial", response=mmartan_response.json())

In [102]:
mundo_enxoval_querystring_users = format_user_querystring(mundo_enxoval_service_params)
mundo_enxoval_querystring_media = format_media_querystring(mundo_enxoval_service_params)
mundo_enxoval_users_url = format_users_url(mundo_enxoval_service_params)
mundo_enxoval_media_url = format_media_url(mundo_enxoval_service_params)

mundo_enxoval_response = requests.get(mundo_enxoval_users_url, headers=headers, params=mundo_enxoval_querystring_users)

save_response_to_json(username="mundodoenxoval", response=casa_almeida_response.json())

In [103]:
santista_querystring_users = format_user_querystring(santista_service_params)
santista_querystring_media = format_media_querystring(santista_service_params)
santista_users_url = format_users_url(santista_service_params)
santista_media_url = format_media_url(santista_service_params)

santista_response = requests.get(santista_users_url, headers=headers, params=santista_querystring_users)

save_response_to_json(username="santistadecora", response=santista_response.json())

In [104]:
trussardi_querystring_users = format_user_querystring(trussardi_service_params)
trussardi_querystring_media = format_media_querystring(trussardi_service_params)
trussardi_users_url = format_users_url(trussardi_service_params)
trussardi_media_url = format_media_url(trussardi_service_params)

trussardi_response = requests.get(trussardi_users_url, headers=headers, params=trussardi_querystring_users)

save_response_to_json(username="trussardioficial", response=trussardi_response.json())

In [105]:
zelo_querystring_users = format_user_querystring(zelo_service_params)
zelo_querystring_media = format_media_querystring(zelo_service_params)
zelo_users_url = format_users_url(zelo_service_params)
zelo_media_url = format_media_url(zelo_service_params)

zelo_response = requests.get(zelo_users_url, headers=headers, params=zelo_querystring_users)

save_response_to_json(username="zeloloja", response=zelo_response.json())

- Montando um DataFrame a partir dos dados coletados

In [111]:
artelasse = parse_account_info(load_response_from_json('artelasse'))
artex = parse_account_info(load_response_from_json('artex'))
camicado = parse_account_info(load_response_from_json("camicado"))
casaalmeidaoficial = parse_account_info(load_response_from_json("casaalmeidaoficial"))
casariachuelo = parse_account_info(load_response_from_json("casariachuelo"))
karstenoficial = parse_account_info(load_response_from_json("karstenoficial"))
mmartanoficial = parse_account_info(load_response_from_json("mmartanoficial"))
mundodoenxoval = parse_account_info(load_response_from_json("mundodoenxoval"))
santistadecora = parse_account_info(load_response_from_json("santistadecora"))
trussardioficial = parse_account_info(load_response_from_json("trussardioficial"))
zeloloja = parse_account_info(load_response_from_json("zeloloja"))

In [127]:
from typing import List

users_list: List[InstagramAccountInfo] = [
    artelasse,
    artex,
    camicado,
    casaalmeidaoficial,
    casariachuelo,
    karstenoficial,
    mmartanoficial,
    mundodoenxoval,
    santistadecora,
    trussardioficial,
    zeloloja
]
data = {
    "name": [],
    "username": [],
    "follower_count": [],
    "total_media": [],
    "last_update": [],
}

df_tracked_users = pd.DataFrame(data)

for user in users_list:
    value = pd.Series([user.name, user.username, user.follower_count, user.total_media, user.last_update])
    df_tracked_users = df_tracked_users._append(dict(zip(df_tracked_users.columns, value)), ignore_index=True)

df_tracked_users


Unnamed: 0,name,username,follower_count,total_media,last_update
0,Artelassê,artelasse,283555.0,3894.0,2024-10-01T16:09:13.987954
1,ARTEX,artex,1029978.0,4081.0,2024-10-01T16:09:13.989424
2,Camicado,camicado,2450449.0,4474.0,2024-10-01T16:09:13.989649
3,Casa Almeida,casaalmeidaoficial,103733.0,1543.0,2024-10-01T16:09:13.989791
4,Casa Riachuelo,casariachuelo,1110401.0,2210.0,2024-10-01T16:09:13.989927
5,Karsten Oficial,karstenoficial,737112.0,4220.0,2024-10-01T16:09:13.990041
6,mmartan,mmartanoficial,1172320.0,4619.0,2024-10-01T16:09:13.990153
7,Casa Almeida,casaalmeidaoficial,103733.0,1543.0,2024-10-01T16:09:13.990262
8,Santista,santistadecora,1406067.0,4415.0,2024-10-01T16:09:13.990369
9,Trussardi,trussardioficial,294117.0,634.0,2024-10-01T16:09:13.990558
