<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# GitHub - Get DataFrame with issue estimate from project view
<a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/GitHub/GitHub_Get_DataFrame_with_issue_estimate_from_project_view.ipynb" target="_parent"><img src="https://naasai-public.s3.eu-west-3.amazonaws.com/Open_in_Naas_Lab.svg"/></a><br><br><a href="https://bit.ly/3JyWIk6">Give Feedbacks</a> | <a href="https://github.com/jupyter-naas/awesome-notebooks/issues/new?assignees=&labels=bug&template=bug_report.md&title=GitHub+-+Get+DataFrame+with+issue+estimate+from+project+view:+Error+short+description">Bug report</a>

**Tags:** #github #dataframe #beautifulsoup #projectview #scraping #python

**Author:** [Benjamin Filly](https://www.linkedin.com/in/benjamin-filly-05427727a/)

**Last update:** 2023-07-31 (Created: 2023-07-20)

**Description:** This notebook demonstrates how to retrieve a dataframe containing issue estimates from the project view using BeautifulSoup. Since GitHub's API doesn't offer a way to fetch issue estimates directly, this method allows us to obtain these estimates and generate statistics by assignee and iteration. To use this template, you must create a view with columns in the following order:
- Issue Title
- Assginees
- Estimate
- LinkedIn pull request

**References:**
- [BeautifulSoup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
- [GitHub Project View](https://help.github.com/en/github/managing-your-work-on-github/about-project-boards)

## Input

### Import libraries

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from IPython.display import display

### Setup Variables
- `url`: URL of the project view page
- `assignee`: Define the name of the assignee for filtering

In [None]:
url = "https://github.com/orgs/jupyter-naas/projects/10/views/20"
assignee_name = None #If the variable is equal to None then the function will not filter the dataframe

## Model

### Get Data from project view

This function returns organised data from the project view soup using BeautifulSoup.

In [None]:
# Init
data = []

# Get HTML from URL
response = requests.get(url)
html = response.text

# Parse HTML
soup = BeautifulSoup(html, "html.parser")

# Get cards
elements = soup.find_all("script", {"id": "memex-items-data"})

# Iterate over the elements and split their text
for element in elements:
    text = element.text
    split_text = text.split('{"contentId":')[1:]  # Split the text as needed
    
    # Split the soup for each element
    for s in split_text:
        s = s.split('"memexProjectColumnId":')[1:]
        # Get the values using splits
        title = s[0].split('"raw":"')[-1].split('"')[0]
        issue_number = s[0].split('"number":')[-1].split(',')[0]
        assignees = s[1].split('"login":"')[-1].split('"')[0]
        PR_url = s[2].split('"url":"')[-1].split('"')[0]
        estimate = s[3].split('"value":')[-1].split('}')[0]
        
        # Handle possible error
        if not str(issue_number).isdigit():
            issue_number = "❌ Error"
        
        # Create a dictionary with the values
        tmp = {
            "Title": title,
            "Issue Number": issue_number,
            "Assignees": assignees,
            "PR URL": PR_url,
            "Estimate": estimate,
        }
        # Append the dictionary to the data list
        data.append(tmp)
        
# Create a DataFrame from the data list
df_init = pd.DataFrame(data)
df_init

## Output

### Creating and customising a dataframe

In [None]:
df = df_init.copy()

# Convert 'Estimate' column to numerical data type
df['Estimate'] = df['Estimate'].str.replace("null", "0")
df['Estimate'] = pd.to_numeric(df['Estimate'], errors='coerce').astype(int)

# Check if assignee is not None before filtering
if assignee_name is not None:
    # Filter by assignee
    filtered_df = df[df['Assignees'] == assignee_name].reset_index(drop=True)
else:
    # Use the original DataFrame without filtering
    filtered_df = df

# Format PR URL as clickable link
filtered_df.loc[:, 'PR URL'] = filtered_df['PR URL'].apply(lambda x: f'<a href="{x}" target="_blank">{x}</a>')

# Apply custom styling to the DataFrame
styled_df = filtered_df.style \
    .set_properties(**{'max-width': '200px'}) \
    .background_gradient(subset=['Estimate'], cmap='Blues') \
    .highlight_null(color='lightgrey') \
    .highlight_max(subset=['Estimate'], color='lightgreen') \
    .highlight_min(subset=['Estimate'], color='lightcoral')
styled_df