<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# GitHub - Get DataFrame from project view

**Tags:** #github #dataframe #beautifulsoup #projectview #scraping #python

**Author:** [Benjamin Filly](https://www.linkedin.com/in/benjamin-filly-05427727a/)

**Description:** This notebook will show how to return a dataframe from project view using BeautifulSoup. It is usefull for organizations to quickly get data from GitHub project view.

**References:**
- [BeautifulSoup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
- [GitHub Project View](https://help.github.com/en/github/managing-your-work-on-github/about-project-boards)

## Input

### Import libraries

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from IPython.display import display

### Setup Variables
- `url`: URL of the project view page
- `assignee`: Define the name of the assignee for filtering

In [2]:
url = "https://github.com/orgs/jupyter-naas/projects/10/views/19"
assignee_name = None #If the variable is equal to None then the function will not filter the dataframe

## Model

### Get Data from project view

This function returns organised data from the project view soup using BeautifulSoup.

In [3]:
# Init
data = []

# Get HTML from URL
response = requests.get(url)
html = response.text

# Parse HTML
soup = BeautifulSoup(html, "html.parser")

# Get cards
elements = soup.find_all("script", {"id": "memex-items-data"})

# Iterate over the elements and split their text
for element in elements:
    text = element.text
    split_text = text.split('{"contentId":')[1:]  # Split the text as needed
    
    # Split the soup for each element
    for s in split_text:
        s = s.split('"memexProjectColumnId":')[1:]
        # Get the values using splits
        title = s[0].split('"raw":"')[-1].split('"')[0]
        issue_number = s[0].split('"number":')[-1].split(',')[0]
        assignees = s[1].split('"login":"')[-1].split('"')[0]
        status_id = s[2].split('"id":"')[-1].split('"')[0]
        PR_url = s[3].split('"url":"')[-1].split('"')[0]
        iteration_id = s[4].split('"id":"')[-1].split('"')[0]
        estimate = s[5].split('"value":')[-1].split('}')[0]
        
        # Handle possible error
        if not str(issue_number).isdigit():
            issue_number = "❌ Error"
        
        #Turning Status_id into text
        if status_id == "7c2c8541":
            status_id = "✅ Done"
        elif status_id == "a89e8c1e":
            status_id = "🔖 Ready"
        elif status_id == "03352485":
            status_id = "📝 Backlog"
        elif status_id == "359edd26":
            status_id = "🏗 In Progress"
        elif status_id == "689c0021":
            status_id = "👀 In Review"
        else:
            status_id = "❌ None"

        # Create a dictionary with the values
        tmp = {
            "Title": title,
            "Issue Number": issue_number,
            "Assignees": assignees,
            "Status ID": status_id,
            "PR URL": PR_url,
            "Iteration ID": iteration_id,
            "Estimate": estimate,
        }
        # Append the dictionary to the data list
        data.append(tmp)


### Creating and customising a dataframe

In [4]:
# Create a DataFrame from the data list
df = pd.DataFrame(data)

# Convert 'Estimate' column to numerical data type
df['Estimate'] = pd.to_numeric(df['Estimate'], errors='coerce')

# Check if assignee is not None before filtering
if assignee_name is not None:
    # Filter by assignee
    filtered_df = df[df['Assignees'] == assignee_name]
else:
    # Use the original DataFrame without filtering
    filtered_df = df

# Format PR URL as clickable link
filtered_df.loc[:, 'PR URL'] = filtered_df['PR URL'].apply(lambda x: f'<a href="{x}" target="_blank">{x}</a>')

# Apply custom styling to the DataFrame
styled_df = filtered_df.style \
    .set_properties(**{'max-width': '200px'}) \
    .background_gradient(subset=['Estimate'], cmap='Blues') \
    .highlight_null(null_color='lightgrey') \
    .highlight_max(subset=['Estimate'], color='lightgreen') \
    .highlight_min(subset=['Estimate'], color='lightcoral')

## Output

### Display result

In [5]:
styled_df

Unnamed: 0,Title,Issue Number,Assignees,Status ID,PR URL,Iteration ID,Estimate
0,Python - Create GitHub repository,1217,knshkp,✅ Done,https://github.com/jupyter-naas/awesome-notebooks/pull/1220,7bb90100,2.0
1,Clone awesome-notebooks folder in Naas cloud file system (name: `__templates__`),356,Dr0p42,✅ Done,https://github.com/jupyter-naas/naas/pull/354,16173293,
2,Rename Get started folder,353,Dr0p42,✅ Done,https://github.com/jupyter-naas/naas/pull/354,16173293,
3,Rename production folder,351,Dr0p42,✅ Done,https://github.com/jupyter-naas/naas/pull/354,16173293,
4,Python - Get current city weather,1207,knshkp,✅ Done,https://github.com/jupyter-naas/awesome-notebooks/pull/1219,16173293,
5,Python - Extract text from a PDF,1194,MinuraPunchihewa,✅ Done,https://github.com/jupyter-naas/awesome-notebooks/pull/1211,16173293,
6,Python - Download image from URL,434,aybruhm,✅ Done,https://github.com/jupyter-naas/awesome-notebooks/pull/1230,16173293,
7,Python - Match pattern with regular expressions,1200,srini047,✅ Done,https://github.com/jupyter-naas/awesome-notebooks/pull/1222,16173293,
8,Fix error on folder/notebooks automatically updated via CI,1209,Dr0p42,✅ Done,,762ca0a8,1.0
9,Python - Compress images,1202,Mohwit,✅ Done,https://github.com/jupyter-naas/awesome-notebooks/pull/1792,16173293,
