# Installation

## Overview 

As explained in the README file, you should have already fetched the datasets locally. 

See `../README.md` and `../script/fetch_drupal_data.sh`.

<details>
    <summary>Click to check data folder structure</summary>
    
```bash
data
├── csv
│   └── countries.csv
└── json
    ├── pages_event
    │   ├── page_0.json
    │   └── page_x.json
    ├── pages_organization
    │   ├── page_0.json
    │   └── page_x.json
    ├── pages_user
    │   ├── page_0.json
    │   └── page_x.json
```
</details>


## Comments

In [None]:
import os
import re
import requests
import pandas as pd

BASE_URL = "https://www.drupal.org/api-d7"

HEADERS = {
    'Accept': 'application/json',
    'Content-Type': 'application/json',
    'User-Agent': 'Drucom 0.1.0'
}


def fetch_comments_by_user(id: int) -> tuple:
    """
    Fetch comments by user.

    Returns
    -------
    tuple
        A tuple containing:
        - total: The count of comment for this user.
        - first: The first comment.
        - last: The last comment.
    """
    total = 0
    first_created = None
    last_created = None

    # Construct the URL for the API request
    PARAMS = {
        'author': id,
        'limit': 1,
        'sort': 'cid',
        'direction': 'ASC',
    }

    print(f"Fetching comments for user ID: {id}")
    response = requests.get(
        f"{BASE_URL}/comment.json", headers=HEADERS, params=PARAMS)

    data = response.json()
    last_page_url = data.get('last', '')

    # Use a regular expression to extract the number after "page="
    match = re.search(r'page=(\d+)', last_page_url)
    if match:
        total = int(match.group(1))
    else:
        total = 0

    if total <= 0:
        return None

    comments_list = data.get('list', [])
    first_created = comments_list[0].get('created', None)

    print(f"Fetching last comment for user ID: {id}")
    PARAMS['page'] = (total - 1)
    response = requests.get(
        f"{BASE_URL}/comment.json", headers=HEADERS, params=PARAMS)
    data = response.json()
    comments_list = data.get('list', [])
    last_created = comments_list[0].get('created', None)

    return (total, first_created, last_created)


# df = pd.read_parquet(os.path.join(SCRIPT_DIR, '../data/user.parquet'))
df = pd.read_parquet('../devusers.parquet')
users = df.copy()


In [None]:
users.head()


Unnamed: 0,id,title,fname,lname,created,da_membership,slack,mentors,countries,language,languages,timezone,region,city,organizations,industries,contributions,events
0,188255,duopixel,,,1190881283,,,[],[MX],,[],,,,[],,[],[]
1,1791882,Evan James,,,1328138277,,,[],[],,[],,,,[],,[],[]
2,1090900,detoxforalcoql,,,1293670876,,,[],[],,[],,,,[],,[],[]
3,2178260,gerardobeebe12,,,1342412824,,,[],[],,[],,,,[],,[],[]
4,38258,peterdv,,,1131020515,,,[],[BE],,[],,,,[],,[],[]


In [None]:
users['comments'] = users['id'].apply(lambda x: fetch_comments_by_user(x))


Fetching comments for user ID: 188255
Fetching comments for user ID: 1791882
Fetching comments for user ID: 1090900
Fetching comments for user ID: 2178260
Fetching comments for user ID: 38258
Fetching comments for user ID: 1889652
Fetching comments for user ID: 577522
Fetching comments for user ID: 1685952
Fetching comments for user ID: 369966
Fetching last comment for user ID: 369966
Fetching comments for user ID: 1702260
Fetching comments for user ID: 54108
Fetching comments for user ID: 2447936
Fetching comments for user ID: 3806578
Fetching comments for user ID: 1591930
Fetching comments for user ID: 3577468
Fetching comments for user ID: 435216
Fetching comments for user ID: 3337570
Fetching comments for user ID: 111433
Fetching comments for user ID: 668166
Fetching comments for user ID: 1203892
Fetching comments for user ID: 3731153
Fetching comments for user ID: 1223668
Fetching comments for user ID: 1062884
Fetching comments for user ID: 3403582
Fetching comments for user ID: 3

In [None]:
users.head(3)


Unnamed: 0,id,title,fname,lname,created,da_membership,slack,mentors,countries,language,languages,timezone,region,city,organizations,industries,contributions,events,comments
0,188255,duopixel,,,1190881283,,,[],[MX],,[],,,,[],,[],[],
1,1791882,Evan James,,,1328138277,,,[],[],,[],,,,[],,[],[],
2,1090900,detoxforalcoql,,,1293670876,,,[],[],,[],,,,[],,[],[],


In [None]:
print(users.size)
print(users[users['comments'].isnull()].size)
print(users[~users['comments'].isnull()].size)


1900
1691
209


## Next step

🔎 You can now open [the exploration](./exploration.ipynb) notebook.