## TheyDo Analytics Engineer: Github repo analysis based on issue label
**Github repo analysis based on issue label**

**Objective:** to assess skill related to bringing new data products to market and test the ability to integrate external data, store it and provide simple visualisation. You have a green-field in terms of platforms and tooling, feel free to pick your own which would suit both the test solution and be applicable also in business  environment.

**Required Deliverables:**
1) Data flow implementation from Github repo Issue list 
2) Data warehouse or DB setup to store this data 
3) Visualisation solution on top of data 

**Steps:**
1) Pick DWH/DB and data flow solution or script of your own making * Fetch issues information from Github for particular repo (configurable or input  argument) 
2) Store fetched information 
3) Provide visualisation of tickets created with specific labels, grouped monthly  for the year of 2023, based on data stored in previous step 
4) Present your approach in a 1hour interview call. If possible share the  deliverables 2 days ahead of the call. 

> Evaluation Criteria:
* Logic and simplicity of the data flow 
* Quality and organization of data schemas
* Clarity of visualisation 
* Ability to explain your decisions and design choices

**Prerequisites**
Install requests library using pip, which makes it easier to send HTTP requests:
`pip install requests`

**Step 1:**
**Generate a Personal Access Token (PAT) on GitHub**
1. Log in to GitHub.
2. Go to Settings > Developer settings > Personal access tokens.
3. Click on "Generate new token".
4. Give a descriptive name, select the scopes or permissions and click "Generate token" at the bottom.


This script will print out the ID and title of each issue in the specified repository, followed by a list of labels associated with each issue.

**Using Python to Access GitHub API**
Use the requests library to make an HTTP GET request to the GitHub API for accessing the issues of a specific repository.


In [1]:
import requests
import pandas as pd

headers = {
    'Authorization': 'token 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',
    'Accept': 'application/vnd.github.v3+json',
}

url = "https://api.github.com/repos/GoogleCloudPlatform/python-docs-samples/issues"
response = requests.get(url, headers=headers)

In [31]:
labels_data = []

# Extract label data from each issue
for issue in issues:
    for label in issue['labels']:
        # You can add more attributes here as needed
        labels_data.append({
             'issue_id': issue['id']
            , 'issue_title': issue['title']
            , 'status': issue['state']
            , 'created_at': issue['created_at']
            , 'updated_at': issue['updated_at']
            , 'closed_at': issue['closed_at']
            , 'label_id': label['id']
            , 'label_name': label['name']
            , 'label_description': label['description']
            , 'label_color': label['color']
        })

# Create a DataFrame from the label data
df_labels = pd.DataFrame(labels_data)

# Display the DataFrame
df_labels

Unnamed: 0,issue_id,issue_title,status,created_at,updated_at,closed_at,label_id,label_name,label_description,label_color
0,2133363347,fix(firestore): failed recursive delete in sni...,open,2024-02-14T00:47:00Z,2024-02-14T00:47:18Z,,1679545544,api: firestore,Issues related to the Firestore API.,d59ac3
1,2133363347,fix(firestore): failed recursive delete in sni...,open,2024-02-14T00:47:00Z,2024-02-14T00:47:18Z,,2209514716,samples,Issues that are directly related to samples.,9932CC
2,2133259597,There is a typo in handle_response_sample_v1be...,open,2024-02-13T22:49:56Z,2024-02-13T22:50:16Z,,1593789886,triage me,I really want to be triaged.,FF69B4
3,2133259597,There is a typo in handle_response_sample_v1be...,open,2024-02-13T22:49:56Z,2024-02-13T22:50:16Z,,1593789918,priority: p2,Moderately-important priority. Fix may not be ...,fef2c0
4,2133259597,There is a typo in handle_response_sample_v1be...,open,2024-02-13T22:49:56Z,2024-02-13T22:50:16Z,,1593790030,type: bug,Error or flaw in code with unintended results ...,db4437
...,...,...,...,...,...,...,...,...,...,...
108,2116420725,automl.tables.batch_predict_test: test_batch_p...,open,2024-02-03T10:48:23Z,2024-02-04T10:25:10Z,,1593790030,type: bug,Error or flaw in code with unintended results ...,db4437
109,2116420725,automl.tables.batch_predict_test: test_batch_p...,open,2024-02-03T10:48:23Z,2024-02-04T10:25:10Z,,1679546805,api: automl,Issues related to the AutoML API.,5bcf2f
110,2116420725,automl.tables.batch_predict_test: test_batch_p...,open,2024-02-03T10:48:23Z,2024-02-04T10:25:10Z,,2209514716,samples,Issues that are directly related to samples.,9932CC
111,2116420725,automl.tables.batch_predict_test: test_batch_p...,open,2024-02-03T10:48:23Z,2024-02-04T10:25:10Z,,2686747014,flakybot: issue,An issue filed by the Flaky Bot. Should not be...,a9f9f7


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=2254e57f-f4d0-4158-b195-020957cbda5d' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>