> *python3 main.py*

**☝️Run on Terminal**<br>
**👇Run on Notebook**

Let's start by importing the necessary libraries.

In [89]:
%pip install plotly pandas

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: C:\Users\oheit\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


From the file `methods`, we can import all the methods needed for extracting the required code samples.

In [90]:
from methods import (filter_by_date, get_samples_from_csv, split_samples,
                     get_paths, get_by_path, create_csv_with_samples)
from datetime import datetime

From the code samples provided, let's utilize code samples posted on StackOverflow within the period from 01/01/2023 to 06/05/2024. Afterwards, let's split the result between `answered` and `unanswered`.

In [91]:
samples = filter_by_date(get_samples_from_csv('sampleQuestions.csv'), datetime.fromisoformat('2023-01-01'), datetime.fromisoformat('2024-06-05'))
answered, unanswered = split_samples(samples)
print(answered)
print(unanswered)

[('github.com', 'compose-samples', '75721334', '2023-03-13T11:34:55.060', '7767664', '1', '75721679'), ('github.com', 'compose-samples', '76528454', '2023-06-22T03:33:51.763', '828896', '1', '76586020'), ('github.com', 'compose-samples', '76685433', '2023-07-14T07:09:53.700', '21197370', '1', '76692740'), ('github.com', 'compose-samples', '77297409', '2023-10-15T15:52:44.370', '15597975', '1', '77297541'), ('github.com', 'nowinandroid', '75495824', '2023-02-18T19:22:55.193', '15597975', '1', '75692828'), ('github.com', 'nowinandroid', '75721334', '2023-03-13T11:34:55.060', '7767664', '1', '75721679'), ('github.com', 'nowinandroid', '75855649', '2023-03-27T12:18:20.317', '7767664', '1', '75856486'), ('github.com', 'snippets', '76912586', '2023-08-16T10:10:45.060', '6751083', '1', '76918823'), ('github.com', 'wear-os-samples', '75455218', '2023-02-15T03:21:35.907', '20057970', '1', '77393816'), ('github.com', 'user-interface-samples', '75907763', '2023-04-01T17:24:20.537', '21544331', '1

In order to compare results between the unanswered and answered questions groups equally, questions will be filtered by paths that are present in both groups.

In [92]:
paths_in_both = get_paths(answered) & get_paths(unanswered)
len(paths_in_both)

14

As a result, we have 14 paths that are present in both groups (answered and unaswered).
<br>
Now it is possible to collect the code samples from these 14 common paths.
<br>
Furthermore, it is fair to also split them again.

In [93]:
by_path = []
for path in paths_in_both:
    by_path.extend(get_by_path(samples, path))
ans_by_path, uns_by_path = split_samples(by_path)

Check the data overview:

In [94]:
print(f'Total code samples from period: {len(samples)}')
print(f'Total samples in both paths {len(by_path)}')
print(f'Answered from paths: {len(ans_by_path)}')
print(f'Unanswered from paths: {len(uns_by_path)}')
print(f'Total of paths present in both: {len(paths_in_both)}')
print(f'List of paths in both: {paths_in_both}')

Total code samples from period: 208
Total samples in both paths 168
Answered from paths: 46
Unanswered from paths: 122
Total of paths present in both: 14
List of paths in both: {'wear-os-samples', 'connectivity-samples', '.github', 'user-interface-samples', 'ndk', 'compose-samples', 'health-samples', 'nowinandroid', 'car-samples', 'rr', 'sunflower', 'kotlin', 'architecture-samples', 'snippets'}


Create a csv file with the data:

In [95]:
create_csv_with_samples(by_path, 'samplesInBothPaths.csv')

In [96]:
import plotly.express as px
import pandas as pd

df = pd.read_csv('samplesInBothPaths.csv')

category_counts = df['path'].value_counts().reset_index()
category_counts.columns = ['grupo', 'quantidade']

fig = px.pie(
    category_counts,
    values='quantidade',
    names='grupo',
    title='Gráfico de pizza para os samples',
    color='grupo',
)

fig.update_traces(textposition='inside', textinfo='percent+label')

fig.show()