# Github Popularity Analysis

# 1 Intro
## 1.1 What's Github?

In [19]:
import requests
import numpy as np
from datetime import datetime
from concurrent.futures import ThreadPoolExecutor
import requests
import os
from concurrent.futures import ThreadPoolExecutor
from bs4 import BeautifulSoup
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# 2 Official Rest API
descrizione dell’uso delle API ufficiali REST, riportando almeno un esempio di utilizzo;

## Access Token

To interact with GitHub APIs, you'll need a personal access token (PAT). This token allows you to authenticate and securely access your resources on GitHub.

## Creating a Token:
1. Go to GitHub.
2. Click on your avatar at the top right and select **Settings**.
3. On the left sidebar, navigate to **Developer settings** > **Personal access tokens**.
4. Click **Generate new token**.
5. Select the desired permissions for your token (e.g., **repo**, **user**, etc.).
6. Copy the generated token, as it will no longer be visible after creation.

In [2]:
import os

# Alternatively, you can store it in a variable directly
token = os.getenv('GITHUB_TOKEN')

## Get User Information

One of the first things you can do with the GitHub API is retrieve information about your own user account.
This request will return the authenticated user's details, such as their username, ID, and email (if public).

In [31]:
import requests
from IPython.display import Image

# Set up the headers with the authorization token
headers = {'Authorization': f'token {token}'}

# Make the request to get the user information
response = requests.get('https://api.github.com/user', headers=headers)

# Check if the request was successful
if response.status_code == 200:
    user_info = response.json()
    
    # Display user information in a readable way
    print(f"Username: {user_info['login']}")
    print(f"Name: {user_info['name']}")
    print(f"ID: {user_info['id']}")
    print(f"Email: {user_info.get('email', 'Not Public')}")
    print(f"Bio: {user_info.get('bio', 'No Bio available')}")
    print(f"Public Repositories: {user_info['public_repos']}")
    print(f"Followers: {user_info['followers']}")
    print(f"Following: {user_info['following']}")
    display(Image(url=user_info['avatar_url']))
else:
    print(f"Failed to fetch user info. Status code: {response.status_code}")


Username: Kespers
Name: Kevin Speranza
ID: 70778427
Email: None
Bio: None
Public Repositories: 3
Followers: 3
Following: 4


## List Repositories for a User

You can use the API to list all the repositories for a given user. By default, it will show public repositories, but you can also include private ones if you have the necessary permissions.

This request will list all public repositories for the specified GitHub user.

In [32]:
# Example: List repositories for a specific user
username = user_info['login']
url = f'https://api.github.com/users/{username}/repos'

response = requests.get(url, headers=headers)

if response.status_code == 200:
    repos = response.json()
    for repo in repos:
        repo_name = repo['name']
        repo_url = repo['html_url']
        print(f"Repository Name: {repo_name}\nRepository Link: {repo_url}\n")
else:
    print(f"Failed to fetch repositories. Status code: {response.status_code}")

Repository Name: github-popularity-analysis
Repository Link: https://github.com/Kespers/github-popularity-analysis

Repository Name: kCHORDS
Repository Link: https://github.com/Kespers/kCHORDS

Repository Name: kCHORDS-recommendation-system
Repository Link: https://github.com/Kespers/kCHORDS-recommendation-system



## Create a New Repository
You can create a new repository under your GitHub account by making a POST request to the GitHub API.

This request creates a new repository with the specified name.

In [10]:
repo_data = {
    'name': 'new-repository',
    'description': 'This is a description of my new repository.',
    'private': True
}

# Send a POST request to create the repository
create_repo_url = 'https://api.github.com/user/repos'
response = requests.post(create_repo_url, headers=headers, json=repo_data)

if response.status_code == 201:
    print("Repository created successfully!")
else:
    print(f"Failed to create repository. Status code: {response.status_code}")

Repository created successfully!


## Issues

### creation
You can create an issue in a repository, which is useful for bug tracking or feature requests.

This request creates a new issue in the specified repository with a title and description.

In [13]:
# Example: Create an issue in a repository
issue_data = {
    'title': 'Bug: Something went wrong',
    'body': 'Details about the bug go here.',
}

# Send a POST request to create the issue
create_issue_url = f'https://api.github.com/repos/{username}/{repo_data["name"]}/issues'
response = requests.post(create_issue_url, headers=headers, json=issue_data)

if response.status_code == 201:
    print("Issue created successfully!")
else:
    print(f"Failed to create issue. Status code: {response.status_code}")

Issue created successfully!


### List
GitHub's API also allows you to list issues for a particular repository. This can be helpful for tracking bug reports or feature requests.

In [14]:
issues_url = f'https://api.github.com/repos/{username}/{repo_data["name"]}/issues'

response = requests.get(issues_url, headers=headers)

if response.status_code == 200:
    issues = response.json()
    for issue in issues:
        print(f"Issue #{issue['number']}: {issue['title']}")
else:
    print(f"Failed to fetch issues. Status code: {response.status_code}")

Issue #1: Bug: Something went wrong


## Pull Requests
To create a pull request, follow these steps:

1. **Create a new branch** based on the base branch.
2. **Push your changes** to the newly created branch.
3. **Create a pull request** to merge the new branch into the base branch.

### Create a New Branch

To create a new branch, you need to:

1. Retrieve the latest commit SHA from the base branch.
2. Create a new branch (`head` branch) from the latest commit.

This is how you can create a branch using the GitHub API:

1. **Get the latest commit SHA** from the base branch.
2. **Create a new branch** with that commit SHA.

In [16]:
base_branch = 'main'  # The base branch to create the new branch from
head_branch = 'feature-branch'  # The branch you want to create

# API endpoint to get the base branch details
get_base_branch_url = f'https://api.github.com/repos/{username}/{repo_data["name"]}/branches/{base_branch}'
response = requests.get(get_base_branch_url, headers=headers)

if response.status_code == 200:
    base_branch_info = response.json()
    base_sha = base_branch_info['commit']['sha']
    
    # Create the new branch (head branch)
    create_branch_url = f'https://api.github.com/repos/{username}/{repo_data["name"]}/git/refs'
    branch_data = {
        'ref': f'refs/heads/{head_branch}',
        'sha': base_sha  # Create the new branch from the latest commit of the base branch
    }
    
    branch_response = requests.post(create_branch_url, headers=headers, json=branch_data)

    if branch_response.status_code == 201:
        print(f"Branch '{head_branch}' created successfully.")
    else:
        print(f"Failed to create branch. Status code: {branch_response.status_code}")
else:
    print(f"Failed to get base branch info. Status code: {response.status_code}")

Branch 'feature-branch' created successfully.


### Create a Pull Request

Once the branch is created, you can create a pull request to merge it into the base branch. Here’s how you do it:

1. **Create a pull request** from your `head` branch to the base branch.
2. Set the title, description, and specify the `head` (feature branch) and `base` (main branch).

Here is an example to create a pull request:

1. Define the title, body, head (feature branch), and base (main branch).
2. Submit the pull request via the API.

In [None]:
import json

headers = {
    "Authorization" : "token {}".format(token),
    "Accept" : "application/vnd.github.sailor-v-preview+json"
}
data= {
    "title" : "PullRequest-Using-GithubAPI",
    "body" : "I have amazing new Features",
    "head" : head_branch,
    "base" : base_branch
}

url = "https://api.github.com/repos/{}/{}/pulls".format(username,repo_data["name"])
response = requests.post(url,data=json.dumps(data), headers=headers)
response

<Response [201]>


### List All Pull Requests

To list all pull requests for a repository, follow these steps:

1. Query the `/pulls` endpoint to get all open pull requests.
2. You can filter the list for open, closed, or merged pull requests.

Here’s an example to list pull requests for a repository:

1. Make a `GET` request to `/pulls` endpoint.
2. Display the list of pull requests.

In [26]:
# API endpoint to list pull requests
list_prs_url = f'https://api.github.com/repos/{username}/{repo_data["name"]}/pulls'
response = requests.get(list_prs_url, headers=headers)

# Check if the request was successful
if response.status_code == 200:
    pull_requests = response.json()
    for pr in pull_requests:
        pr_number = pr['number']
        pr_title = pr['title']
        pr_url = pr['html_url']
        print(f"PR #{pr_number}: {pr_title}")
else:
    print(f"Failed to fetch pull requests. Status code: {response.status_code}")

PR #1: PullRequest-Using-GithubAPI


### Delete a Repository

To delete a repository, you can send a **DELETE** request to the GitHub API. The following steps describe the process:

1. You need to authenticate with a personal access token that has the required permissions.
2. Send a **DELETE** request to the repository's URL.
3. If the request is successful, the repository will be permanently deleted.

Here’s an example of how to delete a repository:

1. Authenticate with your personal access token.
2. Make a **DELETE** request to the repository’s URL.

In [29]:
headers = {'Authorization': f'token {token}'}

# API endpoint to delete the repository
delete_repo_url = f'https://api.github.com/repos/{username}/{repo_data["name"]}'

# Send the DELETE request
print(f"Requesting to delete the repository: {delete_repo_url}...")

response = requests.delete(delete_repo_url, headers=headers)

# Check if the request was successful
if response.status_code == 204:
    print(f"Repository {repo_data['name']} deleted successfully.")
else:
    print(f"Failed to delete repository. Status code: {response.status_code}")
    try:
        # Print the response details for more info
        print("Response details:", response.json())
    except ValueError:
        print("No detailed error message returned.")

Requesting to delete the repository: https://api.github.com/repos/Kespers/new-repository...
Repository new-repository deleted successfully.


# Data extraction & analysis
utilizzo delle API per estrarre dati dal social media e analizzarli per
estrarre informazione dal social media. L’analisi può essere anche molto semplice;

Project Popularity analysis
- i progetti in cui committano sempre le stesse persone?
- velocità di risoluzione issue?
- è creato / mantenuto da persone che sono famosi (tanti follower)
- è un progetto AI?

## Extraction

### Getting the trending repositories
prima di tutto prendiamo le 1000 repo piu famose dell'ultimo anno. in quanto github non fornisce questa info e non esistono api che permettono di farlo andremo ad effettuare scraping del soto [trendshift](https://trendshift.io) tramite bigsoup

In [163]:
# URL della pagina da cui fare lo scraping
URL = "https://trendshift.io/?trending-range=360&trending-limit=1000"

# Scarica il contenuto della pagina
headers = {"User-Agent": "Mozilla/5.0"}
response = requests.get(URL, headers=headers)

In [164]:
print(response.status_code)

200


In [165]:
soup = BeautifulSoup(response.text, "html.parser")

In [166]:
# Trova tutti gli oggetti (blocchi di repository)
repo_blocks = soup.find_all("div", class_="bg-white rounded-lg border border-gray-200 px-4 py-3")
repo_blocks[:5]

[<div class="bg-white rounded-lg border border-gray-200 px-4 py-3"><div class="md:flex md:justify-between md:items-center text-sm md:text-base mb-1"><a class="text-indigo-400 font-medium hover:underline max-w-[3/4] break-all" href="/repositories/2841">codecrafters-io/build-your-own-x</a><div class="hidden md:block"><div class="text-gray-500 flex items-center text-xs md:text-sm"><span class="w-[12px] h-[12px] md:w-3 md:h-3 rounded-full mr-1 shrink-0" style="background-color:#DA5B0B"></span>Markdown</div></div></div><div class="mb-2"><div class="flex items-center space-x-3 text-xs text-gray-500"><div class="flex items-center"><svg aria-hidden="true" class="w-4 h-4 mr-1 text-gray-300" stroke="currentColor" viewbox="0 0 16 16"><path d="M8 .25a.75.75 0 0 1 .673.418l1.882 3.815 4.21.612a.75.75 0 0 1 .416 1.279l-3.046 2.97.719 4.192a.751.751 0 0 1-1.088.791L8 12.347l-3.766 1.98a.75.75 0 0 1-1.088-.79l.72-4.194L.818 6.374a.75.75 0 0 1 .416-1.28l4.21-.611L7.327.668A.75.75 0 0 1 8 .25Zm0 2.445L6

In [167]:
repo_list = []

for block in repo_blocks:
	try:
		# Nome repo e link interno
		repo_tag = block.find("a", class_="text-indigo-400 font-medium hover:underline max-w-[3/4] break-all")
		repo_name = repo_tag.text.strip()
		repo_link = "https://trendshift.io" + repo_tag["href"]

		# Linguaggio di programmazione
		lang_tag = block.find("div", class_="text-gray-500 flex items-center text-xs md:text-sm")
		language = lang_tag.text.strip() if lang_tag else "Unknown"

		# Link GitHub
		github_tag = block.find("a", href=True, text="Visit GitHub")
		github_link = github_tag["href"] if github_tag else "N/A"

		# Formatta l'oggetto
		repo_info = {
			"name": repo_name,
			"repo_link": repo_link,
			"language": language,
			"github_link": github_link,
		}

		repo_list.append(repo_info)
	
	except Exception as e:
		print(f"Errore nel parsing di un blocco: {e}\n\n")

  github_tag = block.find("a", href=True, text="Visit GitHub")


In [168]:
repo_list[0]

{'name': 'codecrafters-io/build-your-own-x',
 'repo_link': 'https://trendshift.io/repositories/2841',
 'language': 'Markdown',
 'github_link': 'https://github.com/codecrafters-io/build-your-own-x'}

In [15]:
def save_to_file(path, data):
    file_path = os.path.join(path)
    with open(file_path, 'w') as f:
        for d in data:
            f.write(f'{d}\n')

def read_from_file(path):	
	if os.path.exists(path):
		items = []
		with open(path, 'r') as f:
			lines = f.readlines()
			for line in lines:
				items.append(line.strip())

		return items
	else:
		return []
	
# Funzione per estrarre "owner" e "repo name" da un link GitHub
def parse_github_url(url):
    parts = url.rstrip("/").split("/")
    if len(parts) >= 2:
        return parts[-2], parts[-1]  # Owner, Repo
    return None, None

## KPIs

### Commit authors
per le prime 30 posizioni

In [91]:
def get_commit_authors(owner, repo, token):
    url = f'https://api.github.com/repos/{owner}/{repo}/commits'
    headers = {'Authorization': f'token {token}'}
    
    params = {'per_page': 100}
    response = requests.get(url, headers=headers, params=params)
    
    authors = []
    if response.status_code == 200:
        commits_data = response.json()
        for commit in commits_data:
            author = commit["commit"]["author"]["name"]  # Usa il nome dell'autore
            authors.append(author)
    else:
        print(f'Error fetching commits for {repo}: {response.status_code}, {response.content}')
    
    return list(authors)

def process_repo(repo, token):
	owner, repo_name = parse_github_url(repo["github_link"])

	print(repo_name)

	# Ottieni gli autori dei commit
	commit_authors = get_commit_authors(owner, repo_name, token)

	# Salva gli autori in un file
	save_to_file(f'results/project/contributors/{repo_name}.txt', commit_authors)

with ThreadPoolExecutor() as executor:
    executor.map(lambda repo: process_repo(repo, token), repo_list[:30])

build-your-own-x
lobe-chat
dify
project-based-learning
free-programming-books
system-design-primer
generative-ai-for-beginners
ollama
immich
freeCodeCamp
developer-roadmap
Stirling-PDF
ragflow
hello-algo
public-apis
coding-interview-university
firecrawl
fabric
MoneyPrinterTurbo
storm
twenty
phidata
drawdb
kotaemon
yt-dlp
ComfyUI
swift-composable-architecture
fish-speech
MaxKB
Deep-Live-Cam


### Velocità chiusura issue

In [92]:
# Funzione per calcolare il tempo medio di risoluzione delle issue
def get_issue_resolution_time(owner, repo):
    url = f"https://api.github.com/repos/{owner}/{repo}/issues"
    params = {"state": "closed", "per_page": 100}  # Prende fino a 100 issue chiuse
    headers = {"Authorization": f"token {token}"}  # Usa la variabile `token`

    response = requests.get(url, headers=headers, params=params)

    if response.status_code != 200:
        print(f"Errore {response.status_code} per {repo}")
        return None

    issues = response.json()
    resolution_times = []

    for issue in issues:
        if "pull_request" in issue:  # Escludi PR
            continue
        created_at = datetime.strptime(issue["created_at"], "%Y-%m-%dT%H:%M:%SZ")
        closed_at = issue.get("closed_at")
        
        if closed_at:
            closed_at = datetime.strptime(closed_at, "%Y-%m-%dT%H:%M:%SZ")
            resolution_times.append((closed_at - created_at).total_seconds())

    if resolution_times:
        avg_time = np.mean(resolution_times) / 3600  # Converti in ore
        return avg_time
    return None

# Funzione per processare una repo in parallelo
def process_issue_closing_time(repo):
	owner, repo_name = parse_github_url(repo["github_link"])
     
	print(repo_name)

	avg_resolution_time = get_issue_resolution_time(owner, repo_name)
		
	save_to_file(f"results/project/resolution_time/{repo_name}.txt", [f"{avg_resolution_time:.2f}"] if avg_resolution_time else [])

# Esegui in parallelo con ThreadPoolExecutor
with ThreadPoolExecutor() as executor:  # Puoi aumentare il numero di thread se necessario
    executor.map(process_issue_closing_time, repo_list)

build-your-own-x
lobe-chat
dify
project-based-learning
free-programming-books
system-design-primer
generative-ai-for-beginners
ollama
immich
freeCodeCamp
developer-roadmap
Stirling-PDF
ragflow
hello-algo
public-apis
coding-interview-university
firecrawl
fabric
MoneyPrinterTurbo
storm
twenty
phidata
drawdb
kotaemon
yt-dlp
ComfyUI
swift-composable-architecture
fish-speech
MaxKB
Deep-Live-Cam
OpenHands
MinerU
godot
googletest
supervision
screenshot-to-code
maybe
data-engineer-handbook
khoj
dice
weekly
Scrapegraph-ai
rustdesk
AFFiNE
postiz-app
Python
LazyVim
OpenDevin
lerobot
zapret
crawl4ai
exo
bitcoin
anything-llm
EasySpider
ente
tech-interview-handbook
GoodbyeDPI
kestra
LibreChat
it-tools
open-interpreter
MiniCPM-V
awesome-llm-apps
glance
awesome-deepseek-integration
freqtrade
CopilotKit
nvm
Perplexica
jan
GitHubDaily
firebase-ios-sdk
bootstrap
llm-course
zed
follow
docling
ToolJet
mistral.rs
llama-stack
shardeum
unstract
computer-science
PowerToys
coolify
ui
Docker-OSX
platform
uv
foll

### Is AI related

In [209]:
import requests

def is_ai_repo(owner, repo, token):
    # Controlla i topic del repository per parole chiave legate all'AI
    url = f"https://api.github.com/repos/{owner}/{repo}/topics"
    headers = {"Authorization": f"token {token}"}
    
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        topics = response.json().get("names", [])
        ai_keywords = ["ai", "machine-learning", "deep-learning", "neural-networks", "nlp", "reinforcement-learning"]
        
        for topic in topics:
            if topic.lower() in ai_keywords:
                return True
    else:
        print(f"Errore nella richiesta per {repo}: {response.status_code}")
    
    # Controlla nel README.md se contiene parole chiave relative all'AI
    readme_url = f"https://api.github.com/repos/{owner}/{repo}/readme"
    readme_response = requests.get(readme_url, headers=headers)
    
    if readme_response.status_code == 200:
        readme_content = readme_response.json().get("content", "")
        ai_keywords_in_readme = ai_keywords = [
			"AI", "Artificial Intelligence", "Machine Learning", "Deep Learning", 
			"Neural Networks", "Convolutional Neural Networks (CNN)", "Recurrent Neural Networks (RNN)", 
			"Generative Adversarial Networks (GAN)", "Natural Language Processing (NLP)", "Reinforcement Learning", 
			"Supervised Learning", "Unsupervised Learning", "Semi-supervised Learning", "Transfer Learning", 
			"Feature Engineering", "Gradient Descent", "Backpropagation", "Autoencoders", 
			"Support Vector Machines (SVM)", "Decision Trees", "Random Forests", "K-means", 
			"K-nearest Neighbors (KNN)", "Linear Regression", "Logistic Regression", "Clustering", 
			"Dimensionality Reduction", "Principal Component Analysis", "TensorFlow", "PyTorch", 
			"Keras", "Scikit-learn", "XGBoost", "LightGBM", "spaCy", "Transformers", "BERT", 
			"GPT", "OpenAI", "Reinforcement Learning", "AlphaGo", "Robotics", "Computer Vision", 
			"Speech Recognition", "Image Recognition", "Text Classification", "Named Entity Recognition (NER)", 
			"Object Detection", "Segmentation", "Face Recognition", "Audio Analysis", "Predictive Modeling", 
			"Time Series Forecasting", "Anomaly Detection", "Artificial Neural Networks (ANN)", "Robo-advisors"
		]
        
        for keyword in ai_keywords_in_readme:
            if keyword.lower() in readme_content.lower():
                return True

    return False

# Verifica se la repository è inerente all'AI

def process_is_ai(repo):
	owner, repo_name = parse_github_url(repo["github_link"])

	print(repo_name)
    
	save_to_file(f"results/project/is_ai/{repo_name}.txt", [is_ai_repo(owner, repo_name, token)])

with ThreadPoolExecutor() as executor:  # Puoi aumentare il numero di thread se necessario
    executor.map(process_is_ai, repo_list)

build-your-own-x
lobe-chat
dify
project-based-learning
free-programming-books
system-design-primer
generative-ai-for-beginners
ollama
immich
freeCodeCamp
developer-roadmap
Stirling-PDF
ragflow
hello-algo
public-apis
coding-interview-university
MoneyPrinterTurbo
firecrawl
fabric
storm
twenty
phidata
drawdb
kotaemon
yt-dlp
ComfyUI
godot
swift-composable-architecture
fish-speech
MaxKB
Deep-Live-Cam
OpenHands
MinerU
googletest
supervision
screenshot-to-code
maybe
data-engineer-handbook
khoj
weekly
OpenDevin
open-interpreter
Scrapegraph-ai
rustdesk
postiz-app
Python
LazyVim
lerobot
zapret
dice
crawl4ai
exo
bitcoin
anything-llm
EasySpider
ente
GoodbyeDPI
kestra
LibreChat
it-tools
MiniCPM-V
awesome-llm-apps
awesome-deepseek-integration
AFFiNE
freqtrade
CopilotKit
nvm
LLMs-from-scratch
Perplexica
jan
GitHubDaily
firebase-ios-sdk
bootstrap
llm-course
zed
follow
docling
ToolJet
mistral.rs
llama-stack
tech-interview-handbook
shardeum
unstract
computer-science
PowerToys
coolify
ui
Docker-OSX
platf

### Watchers count

In [None]:
def get_watch_count(owner, repo, token):
    url = f"https://api.github.com/repos/{owner}/{repo}"
    headers = {'Authorization': f'token {token}'}
    response = requests.get(url, headers=headers)
    
    if response.status_code == 200:
        repo_data = response.json()
        watchers_count = repo_data.get("watchers_count", 0)
        return watchers_count
    else:
        return None


def process_watchers(repo):
	owner, repo_name = parse_github_url(repo["github_link"])

	print(repo_name)
    
	save_to_file(f"results/project/watchers/{repo_name}.txt", [get_watch_count(owner, repo_name, token)])

with ThreadPoolExecutor() as executor:
    executor.map(process_watchers, repo_list)

build-your-own-x
lobe-chat
dify
project-based-learning
free-programming-books
system-design-primer
generative-ai-for-beginners
ollama
immich
freeCodeCamp
developer-roadmap
Stirling-PDF
ragflow
hello-algo
public-apis
coding-interview-university
MoneyPrinterTurbo
firecrawl
fabric
storm
twenty
phidata
drawdb
kotaemon
yt-dlp
ComfyUI
godot
swift-composable-architecture
fish-speech
MaxKB
Deep-Live-Cam
OpenHands
MinerU
googletest
supervision
screenshot-to-code
maybe
data-engineer-handbook
khoj
weekly
OpenDevin
open-interpreter
Scrapegraph-ai
rustdesk
postiz-app
Python
LazyVim
lerobot
zapret
dice
crawl4ai
exo
bitcoin
anything-llm
EasySpider
ente
GoodbyeDPI
kestra
LibreChat
it-tools
MiniCPM-V
awesome-llm-apps
awesome-deepseek-integration
AFFiNE
freqtrade
CopilotKit
nvm
LLMs-from-scratch
Perplexica
jan
GitHubDaily
firebase-ios-sdk
bootstrap
llm-course
zed
follow
docling
ToolJet
mistral.rs
llama-stack
tech-interview-handbook
shardeum
unstract
computer-science
PowerToys
coolify
ui
Docker-OSX
platf

### Contributors count
rate limit hittato

In [None]:
def get_contributor_count(url):
	# Scarica il contenuto della pagina
	headers = {"User-Agent": "Mozilla/5.0"}
	response = requests.get(url, headers=headers)

	page = BeautifulSoup(response.text, "html.parser")

	# Trova tutti gli oggetti (blocchi di repository
	numbers = [
    int(span['title']) for span in page.find_all('span', class_='Counter ml-1')
    if int(span['title']) > 0  # Filtra solo quelli > 0
	]

	return numbers[0]


def process_contributors(repo):
	owner, repo_name = parse_github_url(repo["github_link"])
    
	print(repo_name)

	save_to_file(f"results/project/contributors_count/{repo_name}.txt", [get_contributor_count(repo["github_link"])])

with ThreadPoolExecutor() as executor:
    executor.map(process_contributors, repo_list)

build-your-own-x
lobe-chat
dify
project-based-learning
free-programming-books
system-design-primer
generative-ai-for-beginners
ollama
immich
freeCodeCamp
developer-roadmap
Stirling-PDF
ragflowhello-algo

public-apis
coding-interview-university
firecrawl
fabric
MoneyPrinterTurbo
storm
twenty
phidata
drawdb
kotaemon
yt-dlp
ComfyUI
swift-composable-architecture
fish-speechMaxKB

Deep-Live-Cam
OpenHands
MinerU
godot
googletest
supervision
screenshot-to-code
maybe
data-engineer-handbook
khoj
dice
weekly
Scrapegraph-ai
rustdesk
AFFiNE
postiz-app
Python
LazyVim
OpenDevin
lerobot
zapret
crawl4ai
exo
bitcoin
anything-llm
EasySpider
ente
tech-interview-handbook
GoodbyeDPI
kestra
LibreChat
it-tools
open-interpreter
MiniCPM-V
awesome-llm-apps
glance
awesome-deepseek-integration
freqtrade
CopilotKit
nvm
Perplexica
jan
GitHubDaily
firebase-ios-sdk
bootstrap
llm-course
zed
follow
docling
ToolJet
mistral.rs
llama-stack
shardeum
unstract
computer-science
PowerToys
coolify
ui
Docker-OSX
platform
uv
foll

### Forks

In [46]:
def convert_to_number(text):
    text = text.strip().lower()
    if text.endswith('k'):
        return int(float(text[:-1]) * 1_000)
    elif text.endswith('m'):
        return int(float(text[:-1]) * 1_000_000)
    elif text.endswith('b'):
        return int(float(text[:-1]) * 1_000_000_000)
    else:
        return None
        
def get_forks_count(url):
	# Scarica il contenuto della pagina
	headers = {"User-Agent": "Mozilla/5.0"}
	response = requests.get(url, headers=headers)

	# Verifica se la richiesta ha avuto successo
	if response.status_code != 200:
		print(f"Errore: {response.status_code}")
		return None

	# Parsing della pagina con BeautifulSoup
	soup = BeautifulSoup(response.text, "html.parser")

	# Trova lo span con ID "repo-network-counter"
	span = soup.find('span', id='repo-network-counter')

	# Estrai il valore numerico dal contenuto dello span (se esiste)
	return convert_to_number(span.text.strip()) if span else None


def process_forks(repo):
	owner, repo_name = parse_github_url(repo["github_link"])
    
	print(repo_name)

	save_to_file(f"results/project/forks/{repo_name}.txt", [get_forks_count(repo["github_link"])])

with ThreadPoolExecutor() as executor:
    executor.map(process_forks, repo_list)

build-your-own-x
lobe-chat
dify
project-based-learning
free-programming-books
system-design-primer
generative-ai-for-beginners
ollama
immich
freeCodeCamp
developer-roadmap
Stirling-PDF
ragflow
hello-algo
public-apis
coding-interview-university
firecrawl
fabric
MoneyPrinterTurbo
storm
twenty
phidata
drawdb
kotaemon
yt-dlp
ComfyUI
swift-composable-architecture
fish-speech
MaxKB
Deep-Live-Cam
OpenHands
MinerU
godot
googletest
supervision
screenshot-to-code
maybe
data-engineer-handbook
khoj
dice
weekly
Scrapegraph-ai
rustdesk
AFFiNE
postiz-app
Python
LazyVim
OpenDevin
lerobot
zapret
crawl4ai
exo
bitcoin
anything-llm
EasySpider
ente
tech-interview-handbook
GoodbyeDPI
kestra
LibreChat
it-tools
open-interpreter
MiniCPM-V
awesome-llm-apps
glance
awesome-deepseek-integration
freqtrade
CopilotKit
nvm
Perplexica
jan
GitHubDaily
firebase-ios-sdkbootstrap

llm-course
zed
follow
docling
ToolJet
mistral.rs
llama-stack
shardeum
unstract
computer-science
PowerToys
coolify
ui
Docker-OSX
platform
uv
foll

## Collect results

In [202]:
for repo in repo_list:
    name = repo["name"].split("/")[-1]
    
    repo["contributors"] = read_from_file(f'results/project/commit_authors/{name}.txt')
    repo["resolution_time_avg"] = read_from_file(f'results/project/resolution_time/{name}.txt')
    repo["is_ai"] = read_from_file(f'results/project/is_ai/{name}.txt')
    repo["watchers"] = read_from_file(f'results/project/watchers/{name}.txt')
    repo["contributors_count"] = read_from_file(f'results/project/contributors_count/{name}.txt')
    repo["forks"] = read_from_file(f'results/project/forks/{name}.txt')
    

repo_list = np.array(repo_list)
repo_list[0]

{'name': 'codecrafters-io/build-your-own-x',
 'repo_link': 'https://trendshift.io/repositories/2841',
 'language': 'Markdown',
 'github_link': 'https://github.com/codecrafters-io/build-your-own-x',
 'contributors': ['Paul Kuruvilla',
  'NintenHero',
  'Paul Kuruvilla',
  'Will Squibb',
  'Paul Kuruvilla',
  'Paul Kuruvilla',
  'Paul Kuruvilla',
  'Paul Kuruvilla',
  'Dmitry Dreko',
  'Dmitry Dreko',
  'Paul Kuruvilla',
  'Nicolás Montone',
  'Paul Kuruvilla',
  'João Pedro Lima',
  'Adam Ross',
  'Paul Kuruvilla',
  'João Pedro Lima',
  'Paul Kuruvilla',
  'DanyRenaudier',
  'Paul Kuruvilla',
  'karandeeppotato',
  'Paul Kuruvilla',
  'Aastik',
  'Paul Kuruvilla',
  'erwanvivien',
  'Paul Kuruvilla',
  'Josh Burns',
  'Paul Kuruvilla',
  'Ajay Prem Shankar',
  'Paul Kuruvilla',
  'root',
  'Paul Kuruvilla',
  'Matthew',
  'Paul Kuruvilla',
  'Sarup Banskota',
  'Sarup Banskota',
  'Paul Kuruvilla',
  'sj902',
  'Paul Kuruvilla',
  'Mansur',
  'Gokul2003g',
  'Paul Kuruvilla',
  'Ahmed 

## Normalizzazione

In [203]:
# Funzione di conversione
def convert(entry, is_int):
    count = entry
    
    if isinstance(count, list) and count:  # Verifica che sia una lista non vuota
        if count[0] == 'None':  # Se il primo valore è la stringa 'None'
            return 0
        else:
            try:
                # Convertiamo il primo elemento in float
                return int(count[0]) if is_int else float(count[0])
            except ValueError:
                # Se il valore non è convertibile in float, restituiamo 0
                return 0
    else:
        return 0  # Se non è una lista o la lista è vuota, restituiamo 0
		
for entry in repo_list:
	entry['is_ai'] = 1 if entry['is_ai'] == 'True' else 0
	entry['resolution_time_avg'] = convert(entry['resolution_time_avg'], False)
	entry['watchers'] = convert(entry['watchers'], True)
	entry['contributors_count'] = convert(entry['contributors_count'], True)
	entry['forks'] = convert(entry['forks'], True)

# Verifica
repo_list[0]

{'name': 'codecrafters-io/build-your-own-x',
 'repo_link': 'https://trendshift.io/repositories/2841',
 'language': 'Markdown',
 'github_link': 'https://github.com/codecrafters-io/build-your-own-x',
 'contributors': ['Paul Kuruvilla',
  'NintenHero',
  'Paul Kuruvilla',
  'Will Squibb',
  'Paul Kuruvilla',
  'Paul Kuruvilla',
  'Paul Kuruvilla',
  'Paul Kuruvilla',
  'Dmitry Dreko',
  'Dmitry Dreko',
  'Paul Kuruvilla',
  'Nicolás Montone',
  'Paul Kuruvilla',
  'João Pedro Lima',
  'Adam Ross',
  'Paul Kuruvilla',
  'João Pedro Lima',
  'Paul Kuruvilla',
  'DanyRenaudier',
  'Paul Kuruvilla',
  'karandeeppotato',
  'Paul Kuruvilla',
  'Aastik',
  'Paul Kuruvilla',
  'erwanvivien',
  'Paul Kuruvilla',
  'Josh Burns',
  'Paul Kuruvilla',
  'Ajay Prem Shankar',
  'Paul Kuruvilla',
  'root',
  'Paul Kuruvilla',
  'Matthew',
  'Paul Kuruvilla',
  'Sarup Banskota',
  'Sarup Banskota',
  'Paul Kuruvilla',
  'sj902',
  'Paul Kuruvilla',
  'Mansur',
  'Gokul2003g',
  'Paul Kuruvilla',
  'Ahmed 

In [204]:
df = pd.DataFrame(list(repo_list))

df

Unnamed: 0,name,repo_link,language,github_link,contributors,resolution_time_avg,is_ai,watchers,contributors_count,forks
0,codecrafters-io/build-your-own-x,https://trendshift.io/repositories/2841,Markdown,https://github.com/codecrafters-io/build-your-...,"[Paul Kuruvilla, NintenHero, Paul Kuruvilla, W...",52.24,0,359450,120,33500
1,lobehub/lobe-chat,https://trendshift.io/repositories/2256,TypeScript,https://github.com/lobehub/lobe-chat,"[lobehubbot, cnJasonZ, lobehubbot, semantic-re...",14.83,0,57837,227,12300
2,langgenius/dify,https://trendshift.io/repositories/2152,TypeScript,https://github.com/langgenius/dify,"[诗浓, Jyong, Jyong, crazywoola, Yongtao Huang, ...",3.58,0,83365,702,12400
3,practical-tutorials/project-based-learning,https://trendshift.io/repositories/2804,Unknown,https://github.com/practical-tutorials/project...,"[Axel Baudot, Cheese, Rory Donald, Rivaan Rana...",917.80,0,221293,104,28900
4,EbookFoundation/free-programming-books,https://trendshift.io/repositories/2657,HTML,https://github.com/EbookFoundation/free-progra...,"[Ivan Oranciuc, Artyom V. Poptsov, Apexq, Migl...",55.15,0,353411,432,63100
...,...,...,...,...,...,...,...,...,...,...
995,xai-org/grok-1,https://trendshift.io/repositories/8659,Python,https://github.com/xai-org/grok-1,[],0.00,0,0,7,8400
996,xlang-ai/OSWorld,https://trendshift.io/repositories/9346,Python,https://github.com/xlang-ai/OSWorld,[],0.00,0,0,26,0
997,adrianhajdin/banking,https://trendshift.io/repositories/9879,TypeScript,https://github.com/adrianhajdin/banking,[],0.00,0,0,2,0
998,Tencent/HunyuanDiT,https://trendshift.io/repositories/10188,Jupyter Notebook,https://github.com/Tencent/HunyuanDiT,[],0.00,0,0,7,0


In [205]:
df = df.drop(columns=['github_link'])
df

Unnamed: 0,name,repo_link,language,contributors,resolution_time_avg,is_ai,watchers,contributors_count,forks
0,codecrafters-io/build-your-own-x,https://trendshift.io/repositories/2841,Markdown,"[Paul Kuruvilla, NintenHero, Paul Kuruvilla, W...",52.24,0,359450,120,33500
1,lobehub/lobe-chat,https://trendshift.io/repositories/2256,TypeScript,"[lobehubbot, cnJasonZ, lobehubbot, semantic-re...",14.83,0,57837,227,12300
2,langgenius/dify,https://trendshift.io/repositories/2152,TypeScript,"[诗浓, Jyong, Jyong, crazywoola, Yongtao Huang, ...",3.58,0,83365,702,12400
3,practical-tutorials/project-based-learning,https://trendshift.io/repositories/2804,Unknown,"[Axel Baudot, Cheese, Rory Donald, Rivaan Rana...",917.80,0,221293,104,28900
4,EbookFoundation/free-programming-books,https://trendshift.io/repositories/2657,HTML,"[Ivan Oranciuc, Artyom V. Poptsov, Apexq, Migl...",55.15,0,353411,432,63100
...,...,...,...,...,...,...,...,...,...
995,xai-org/grok-1,https://trendshift.io/repositories/8659,Python,[],0.00,0,0,7,8400
996,xlang-ai/OSWorld,https://trendshift.io/repositories/9346,Python,[],0.00,0,0,26,0
997,adrianhajdin/banking,https://trendshift.io/repositories/9879,TypeScript,[],0.00,0,0,2,0
998,Tencent/HunyuanDiT,https://trendshift.io/repositories/10188,Jupyter Notebook,[],0.00,0,0,7,0


In [209]:
cols_to_normalize = ['resolution_time_avg', 'watchers', 'contributors_count', 'forks']

# Min-Max Normalization (scalare i dati tra 0 e 1)
scaler_minmax = MinMaxScaler()
df[cols_to_normalize] = scaler_minmax.fit_transform(df[cols_to_normalize])

df

Unnamed: 0,name,repo_link,language,contributors,resolution_time_avg,is_ai,watchers,contributors_count,forks
0,codecrafters-io/build-your-own-x,https://trendshift.io/repositories/2841,Markdown,"[Paul Kuruvilla, NintenHero, Paul Kuruvilla, W...",0.003289,0,0.869133,0.121951,0.423515
1,lobehub/lobe-chat,https://trendshift.io/repositories/2256,TypeScript,"[lobehubbot, cnJasonZ, lobehubbot, semantic-re...",0.000934,0,0.139847,0.230691,0.155499
2,langgenius/dify,https://trendshift.io/repositories/2152,TypeScript,"[诗浓, Jyong, Jyong, crazywoola, Yongtao Huang, ...",0.000225,0,0.201573,0.713415,0.156764
3,practical-tutorials/project-based-learning,https://trendshift.io/repositories/2804,Unknown,"[Axel Baudot, Cheese, Rory Donald, Rivaan Rana...",0.057789,0,0.535076,0.105691,0.365360
4,EbookFoundation/free-programming-books,https://trendshift.io/repositories/2657,HTML,"[Ivan Oranciuc, Artyom V. Poptsov, Apexq, Migl...",0.003473,0,0.854531,0.439024,0.797724
...,...,...,...,...,...,...,...,...,...
995,xai-org/grok-1,https://trendshift.io/repositories/8659,Python,[],0.000000,0,0.000000,0.007114,0.106195
996,xlang-ai/OSWorld,https://trendshift.io/repositories/9346,Python,[],0.000000,0,0.000000,0.026423,0.000000
997,adrianhajdin/banking,https://trendshift.io/repositories/9879,TypeScript,[],0.000000,0,0.000000,0.002033,0.000000
998,Tencent/HunyuanDiT,https://trendshift.io/repositories/10188,Jupyter Notebook,[],0.000000,0,0.000000,0.007114,0.000000


## Analysis

- distribuzione popolarità: per ogni kpi fai asse x posizione, y kpi
	- per i linguaggi fai ranking in base alla posizione (piu in alto e più rilevante)


- treemap linguaggi rispetto al counter

- grafico coordinates:
	- evidenzia posizione con colore piu acceso
	- quadranti
		- resolution_time_avg
		- watchers
		- contributors count
		- forks

- i primi 30 repo hanno sempre gli stessi che committano (a parte i bot)
	- istogramma

# 5 Conclusions