## Conhecendo a biblioteca requests

### Primeira requisição

In [37]:
import requests 

r = requests.get('https://api.github.com/events')
print(r)

<Response [200]>


- Dento do trecho de comando acima, a "uri" (início da url, dada por: https://api.github.com) representa o endereço na web da API propriamente dita, enquanto a indicação "/events" ao final da composição da url (também conhecido como **endpoint**) representa o recurso da API que temos interesse em acessar

- O método **text** nos auxilia a visualizar o que de fato a API está retornando, mas não é muito útil do ponto de vista de tratamento de dados pois tem como saída um objeto do tipo string

In [38]:
print(r.text)

[{"id":"39000835722","type":"PushEvent","actor":{"id":5514030,"login":"javastraat","display_login":"javastraat","gravatar_id":"","url":"https://api.github.com/users/javastraat","avatar_url":"https://avatars.githubusercontent.com/u/5514030?"},"repo":{"id":194516388,"name":"DMR-Database/md380tools","url":"https://api.github.com/repos/DMR-Database/md380tools"},"payload":{"repository_id":194516388,"push_id":18731865680,"size":1,"distinct_size":1,"ref":"refs/heads/master","head":"dc485f5fbf4e312dbe764da04191dd7c835e62eb","before":"dd41b9d6c8eaa0ef97ee13fad424fdfe25ee0fb0","commits":[{"sha":"dc485f5fbf4e312dbe764da04191dd7c835e62eb","author":{"email":"javastraat@hotmail.com","name":"Albert Einstein"},"message":"<new firmware-noGPS.bin *autogenerated* by pd2emc>","distinct":true,"url":"https://api.github.com/repos/DMR-Database/md380tools/commits/dc485f5fbf4e312dbe764da04191dd7c835e62eb"}]},"public":true,"created_at":"2024-06-04T22:16:36Z","org":{"id":52372652,"login":"DMR-Database","gravatar_

In [39]:
print(r.text[0])
print(r.text[10])

[
0


- Para incluir o retorno da API num dicionário (tipo de objeto dentro do python compatível com o formato JSON, normalmente incluído nas APIs atuais), podemos ou utilizar o método **json** (built-in da própria biblioteca requests) ou o método **loads** da biblioteca *json*, que recebe um json no formato string e o transforma num objeto do tipo dicionário

In [40]:
# método 1) método json built-in
response = r.json()
print(response[0]) 
print(response[3])

# lista de dicionários # 

{'id': '39000835722', 'type': 'PushEvent', 'actor': {'id': 5514030, 'login': 'javastraat', 'display_login': 'javastraat', 'gravatar_id': '', 'url': 'https://api.github.com/users/javastraat', 'avatar_url': 'https://avatars.githubusercontent.com/u/5514030?'}, 'repo': {'id': 194516388, 'name': 'DMR-Database/md380tools', 'url': 'https://api.github.com/repos/DMR-Database/md380tools'}, 'payload': {'repository_id': 194516388, 'push_id': 18731865680, 'size': 1, 'distinct_size': 1, 'ref': 'refs/heads/master', 'head': 'dc485f5fbf4e312dbe764da04191dd7c835e62eb', 'before': 'dd41b9d6c8eaa0ef97ee13fad424fdfe25ee0fb0', 'commits': [{'sha': 'dc485f5fbf4e312dbe764da04191dd7c835e62eb', 'author': {'email': 'javastraat@hotmail.com', 'name': 'Albert Einstein'}, 'message': '<new firmware-noGPS.bin *autogenerated* by pd2emc>', 'distinct': True, 'url': 'https://api.github.com/repos/DMR-Database/md380tools/commits/dc485f5fbf4e312dbe764da04191dd7c835e62eb'}]}, 'public': True, 'created_at': '2024-06-04T22:16:36Z'

In [41]:
# método 2) utilizando a biblioteca json
import json 

response = json.loads(r.text)
print(response[0])
print(response[3])

{'id': '39000835722', 'type': 'PushEvent', 'actor': {'id': 5514030, 'login': 'javastraat', 'display_login': 'javastraat', 'gravatar_id': '', 'url': 'https://api.github.com/users/javastraat', 'avatar_url': 'https://avatars.githubusercontent.com/u/5514030?'}, 'repo': {'id': 194516388, 'name': 'DMR-Database/md380tools', 'url': 'https://api.github.com/repos/DMR-Database/md380tools'}, 'payload': {'repository_id': 194516388, 'push_id': 18731865680, 'size': 1, 'distinct_size': 1, 'ref': 'refs/heads/master', 'head': 'dc485f5fbf4e312dbe764da04191dd7c835e62eb', 'before': 'dd41b9d6c8eaa0ef97ee13fad424fdfe25ee0fb0', 'commits': [{'sha': 'dc485f5fbf4e312dbe764da04191dd7c835e62eb', 'author': {'email': 'javastraat@hotmail.com', 'name': 'Albert Einstein'}, 'message': '<new firmware-noGPS.bin *autogenerated* by pd2emc>', 'distinct': True, 'url': 'https://api.github.com/repos/DMR-Database/md380tools/commits/dc485f5fbf4e312dbe764da04191dd7c835e62eb'}]}, 'public': True, 'created_at': '2024-06-04T22:16:36Z'

### Explorando a biblioteca

In [42]:
print(r.status_code)
print(r.url)

200
https://api.github.com/events


In [43]:
print(r.text)

[{"id":"39000835722","type":"PushEvent","actor":{"id":5514030,"login":"javastraat","display_login":"javastraat","gravatar_id":"","url":"https://api.github.com/users/javastraat","avatar_url":"https://avatars.githubusercontent.com/u/5514030?"},"repo":{"id":194516388,"name":"DMR-Database/md380tools","url":"https://api.github.com/repos/DMR-Database/md380tools"},"payload":{"repository_id":194516388,"push_id":18731865680,"size":1,"distinct_size":1,"ref":"refs/heads/master","head":"dc485f5fbf4e312dbe764da04191dd7c835e62eb","before":"dd41b9d6c8eaa0ef97ee13fad424fdfe25ee0fb0","commits":[{"sha":"dc485f5fbf4e312dbe764da04191dd7c835e62eb","author":{"email":"javastraat@hotmail.com","name":"Albert Einstein"},"message":"<new firmware-noGPS.bin *autogenerated* by pd2emc>","distinct":true,"url":"https://api.github.com/repos/DMR-Database/md380tools/commits/dc485f5fbf4e312dbe764da04191dd7c835e62eb"}]},"public":true,"created_at":"2024-06-04T22:16:36Z","org":{"id":52372652,"login":"DMR-Database","gravatar_

In [44]:
print(r.json())

[{'id': '39000835722', 'type': 'PushEvent', 'actor': {'id': 5514030, 'login': 'javastraat', 'display_login': 'javastraat', 'gravatar_id': '', 'url': 'https://api.github.com/users/javastraat', 'avatar_url': 'https://avatars.githubusercontent.com/u/5514030?'}, 'repo': {'id': 194516388, 'name': 'DMR-Database/md380tools', 'url': 'https://api.github.com/repos/DMR-Database/md380tools'}, 'payload': {'repository_id': 194516388, 'push_id': 18731865680, 'size': 1, 'distinct_size': 1, 'ref': 'refs/heads/master', 'head': 'dc485f5fbf4e312dbe764da04191dd7c835e62eb', 'before': 'dd41b9d6c8eaa0ef97ee13fad424fdfe25ee0fb0', 'commits': [{'sha': 'dc485f5fbf4e312dbe764da04191dd7c835e62eb', 'author': {'email': 'javastraat@hotmail.com', 'name': 'Albert Einstein'}, 'message': '<new firmware-noGPS.bin *autogenerated* by pd2emc>', 'distinct': True, 'url': 'https://api.github.com/repos/DMR-Database/md380tools/commits/dc485f5fbf4e312dbe764da04191dd7c835e62eb'}]}, 'public': True, 'created_at': '2024-06-04T22:16:36Z

### Utilizando outro endpoint

In [45]:
r = requests.get('https://api.github.com/versions')

print(r.status_code)
print(r.json())

200
['2022-11-28']


- Podemos especificar a versão da API que temos interesse em consultar através do parâmetro *header* da requisição. Trata-se de um parâmetro que atua justamente com a finalidade de incluir na nossa requisição "configurações" ou "opções" extras de chamada.

In [46]:
# especificando a versão da API
header = {'X-GitHub-Api-Version': '2022-11-28'}

# ps.: a maneira como as variáveis do header devem/podem ser definidas variam de API para API, por isso é importante sempre estar atento à documentação

In [47]:
r = requests.get('https://api.github.com/versions', headers=header)
print(r.text)

["2022-11-28"]


### Desafio - Consultando dados de usuários

In [48]:
response = requests.get('https://api.github.com/users/Lucas01iveira')

# validação
print(response)
print(response.status_code)
print(response.text)
print(response.url)

<Response [200]>
200
{"login":"Lucas01iveira","id":87955029,"node_id":"MDQ6VXNlcjg3OTU1MDI5","avatar_url":"https://avatars.githubusercontent.com/u/87955029?v=4","gravatar_id":"","url":"https://api.github.com/users/Lucas01iveira","html_url":"https://github.com/Lucas01iveira","followers_url":"https://api.github.com/users/Lucas01iveira/followers","following_url":"https://api.github.com/users/Lucas01iveira/following{/other_user}","gists_url":"https://api.github.com/users/Lucas01iveira/gists{/gist_id}","starred_url":"https://api.github.com/users/Lucas01iveira/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/Lucas01iveira/subscriptions","organizations_url":"https://api.github.com/users/Lucas01iveira/orgs","repos_url":"https://api.github.com/users/Lucas01iveira/repos","events_url":"https://api.github.com/users/Lucas01iveira/events{/privacy}","received_events_url":"https://api.github.com/users/Lucas01iveira/received_events","type":"User","site_admin":false,"name":"Luca

In [49]:
response.json()

{'login': 'Lucas01iveira',
 'id': 87955029,
 'node_id': 'MDQ6VXNlcjg3OTU1MDI5',
 'avatar_url': 'https://avatars.githubusercontent.com/u/87955029?v=4',
 'gravatar_id': '',
 'url': 'https://api.github.com/users/Lucas01iveira',
 'html_url': 'https://github.com/Lucas01iveira',
 'followers_url': 'https://api.github.com/users/Lucas01iveira/followers',
 'following_url': 'https://api.github.com/users/Lucas01iveira/following{/other_user}',
 'gists_url': 'https://api.github.com/users/Lucas01iveira/gists{/gist_id}',
 'starred_url': 'https://api.github.com/users/Lucas01iveira/starred{/owner}{/repo}',
 'subscriptions_url': 'https://api.github.com/users/Lucas01iveira/subscriptions',
 'organizations_url': 'https://api.github.com/users/Lucas01iveira/orgs',
 'repos_url': 'https://api.github.com/users/Lucas01iveira/repos',
 'events_url': 'https://api.github.com/users/Lucas01iveira/events{/privacy}',
 'received_events_url': 'https://api.github.com/users/Lucas01iveira/received_events',
 'type': 'User',
 '

In [50]:
response_json = response.json()

print('Nome: {}'.format(response_json['name']))
print('Nome de usuário: {}'.format(response_json['login']))
print('Quantidade de repositórios públicos: {}'.format(response_json['public_repos']))
print('Data de criação da conta: {}'.format(response_json['created_at']))

Nome: Lucas de Paula Oliveira
Nome de usuário: Lucas01iveira
Quantidade de repositórios públicos: 10
Data de criação da conta: 2021-07-26T00:49:29Z


### Desafio - Obtendo os dados de seguidores do perfil da Amazon
- Nesse desafio, serão aprofundados os conceitos de autenticação (no caso dessa API, método OAuth) e paginação (que se refere a um processo de organização de dados para lidar com grandes volumes de dados)

- Obs.: Como o Git não permite a exposição de dados sensíveis (como por exemplo o token de acesso do meu usuário), o código de resolução será incluído num arquivo apartado do versionamento git.

### Transformando os dados

In [54]:
api_base_url = 'https://api.github.com'
owner = 'amzn' # username de quem vamos extrair os dados
url = f'{api_base_url}/users/{owner}/repos'

#headers = {'Authorization': 'Bearer '+token}

repos_list = []
for page_num in range(1,7):
    try:
        url_page = f'{url}?page={page_num}'
        response = requests.get(url_page, headers=headers)
        repos_list.append(response.json())
    except:
        repos_list.append(None)


In [55]:
repos_list

[[{'id': 171339259,
   'node_id': 'MDEwOlJlcG9zaXRvcnkxNzEzMzkyNTk=',
   'name': '.github',
   'full_name': 'amzn/.github',
   'private': False,
   'owner': {'login': 'amzn',
    'id': 8594673,
    'node_id': 'MDEyOk9yZ2FuaXphdGlvbjg1OTQ2NzM=',
    'avatar_url': 'https://avatars.githubusercontent.com/u/8594673?v=4',
    'gravatar_id': '',
    'url': 'https://api.github.com/users/amzn',
    'html_url': 'https://github.com/amzn',
    'followers_url': 'https://api.github.com/users/amzn/followers',
    'following_url': 'https://api.github.com/users/amzn/following{/other_user}',
    'gists_url': 'https://api.github.com/users/amzn/gists{/gist_id}',
    'starred_url': 'https://api.github.com/users/amzn/starred{/owner}{/repo}',
    'subscriptions_url': 'https://api.github.com/users/amzn/subscriptions',
    'organizations_url': 'https://api.github.com/users/amzn/orgs',
    'repos_url': 'https://api.github.com/users/amzn/repos',
    'events_url': 'https://api.github.com/users/amzn/events{/privac

In [56]:
repos_list[0][0]['name']

'.github'

In [57]:
repos_names = []
repos_languages = []
for page in repos_list:
    for repo in page:
        repos_names.append(repo['name'])
        repos_languages.append(repo['language'])


In [58]:
print(repos_names[:10])
print(repos_languages[:10])

['.github', 'ads-advanced-tools-docs', 'ads-pao-amznjs-gtm-template', 'alexa-coho', 'alexa-skills-kit-js', 'amazon-ads-advertiser-audience-normalization-sdk-py', 'amazon-advertising-api-php-sdk', 'amazon-codeguru-profiler-for-spark', 'amazon-frustration-free-setup-certification-tool', 'amazon-hub-counter-api-docs']
[None, None, 'Smarty', 'JavaScript', None, 'Python', 'PHP', 'Java', 'Python', 'CSS']


In [59]:
print(len(repos_names))
print(len(repos_languages))

154
154


### Gerando um data frame a partir das informações obtidas

In [60]:
import pandas as pd 

data_mapping = {'Repositorios': repos_names, 'LinguagensProgramacao': repos_languages}

df = pd.DataFrame(data= data_mapping)
df.to_csv(r'C:\Users\LUCAS\Documents\Data Engineering\Data-Engineering\Level 1 - Data Engineering\Formation - First Steps as a Data Engineer\Python and APIs - Knowing the requests module\InformativoAmazon.csv', index=False, header=True, encoding='utf-8')

In [61]:
df.head()

Unnamed: 0,Repositorios,LinguagensProgramacao
0,.github,
1,ads-advanced-tools-docs,
2,ads-pao-amznjs-gtm-template,Smarty
3,alexa-coho,JavaScript
4,alexa-skills-kit-js,


In [62]:
# outra maneira de obter o mesmo resultado:
df = pd.DataFrame()
df['repository_names'] = repos_names
df['languages'] = repos_languages

df.head()

Unnamed: 0,repository_names,languages
0,.github,
1,ads-advanced-tools-docs,
2,ads-pao-amznjs-gtm-template,Smarty
3,alexa-coho,JavaScript
4,alexa-skills-kit-js,


### Desafio Prático - Estruturando os nomes dos seguidores

In [80]:
api_base_url = 'https://api.github.com'
username = 'amzn' # username de quem vamos extrair os dados
url_final = f'{api_base_url}/users/{username}/followers'

#header = {'Authorization': 'Bearer '+token,'X-GitHub-Api-Version': '2022-11-28'}

followers = []
num_page = 1
while True:
    url = f'{url_final}?page={num_page}'
    #print(url)

    r = requests.get(url, headers=header)
    response = r.json()
    #print(response)
    #print(r.status_code)

    if r.status_code != 200:
        raise ValueError(f'Conexão Falha na página {num_page}. Favor verificar.')
    
    if len(response) == 0:
        break

    followers.append(response)
    num_page+=1

In [81]:
print(followers)

[[{'login': 'tkersey', 'id': 217, 'node_id': 'MDQ6VXNlcjIxNw==', 'avatar_url': 'https://avatars.githubusercontent.com/u/217?v=4', 'gravatar_id': '', 'url': 'https://api.github.com/users/tkersey', 'html_url': 'https://github.com/tkersey', 'followers_url': 'https://api.github.com/users/tkersey/followers', 'following_url': 'https://api.github.com/users/tkersey/following{/other_user}', 'gists_url': 'https://api.github.com/users/tkersey/gists{/gist_id}', 'starred_url': 'https://api.github.com/users/tkersey/starred{/owner}{/repo}', 'subscriptions_url': 'https://api.github.com/users/tkersey/subscriptions', 'organizations_url': 'https://api.github.com/users/tkersey/orgs', 'repos_url': 'https://api.github.com/users/tkersey/repos', 'events_url': 'https://api.github.com/users/tkersey/events{/privacy}', 'received_events_url': 'https://api.github.com/users/tkersey/received_events', 'type': 'User', 'site_admin': False}, {'login': 'njonsson', 'id': 645, 'node_id': 'MDQ6VXNlcjY0NQ==', 'avatar_url': 'h

In [83]:
followers[0][0]

{'login': 'tkersey',
 'id': 217,
 'node_id': 'MDQ6VXNlcjIxNw==',
 'avatar_url': 'https://avatars.githubusercontent.com/u/217?v=4',
 'gravatar_id': '',
 'url': 'https://api.github.com/users/tkersey',
 'html_url': 'https://github.com/tkersey',
 'followers_url': 'https://api.github.com/users/tkersey/followers',
 'following_url': 'https://api.github.com/users/tkersey/following{/other_user}',
 'gists_url': 'https://api.github.com/users/tkersey/gists{/gist_id}',
 'starred_url': 'https://api.github.com/users/tkersey/starred{/owner}{/repo}',
 'subscriptions_url': 'https://api.github.com/users/tkersey/subscriptions',
 'organizations_url': 'https://api.github.com/users/tkersey/orgs',
 'repos_url': 'https://api.github.com/users/tkersey/repos',
 'events_url': 'https://api.github.com/users/tkersey/events{/privacy}',
 'received_events_url': 'https://api.github.com/users/tkersey/received_events',
 'type': 'User',
 'site_admin': False}

In [86]:
followers_names = []
for page in followers:
    for follower in page:
        followers_names.append(follower['login'])

followers_names[:5]

['tkersey', 'njonsson', 'bangpound', 'koconder', 'Rud5G']

In [87]:
data_map = {'FollowersNames': followers_names}
df = pd.DataFrame(data= data_map)

display(df.head())

Unnamed: 0,FollowersNames
0,tkersey
1,njonsson
2,bangpound
3,koconder
4,Rud5G
