# Chingu Member Data Cleaning v58

# Project Overview
This project's ultimate aim is to create an app that visualizes demographics for Chingu Members. Further project requirements and specifications set by the Chingu Organization can be found on github here: https://github.com/chingu-voyages/voyage-project-chingu-map
# Our Team: 
- Formed for Voyage 58, we are team 37
- Michael, Shruti, Jessica, Henry, Gisele
- Our github repo containing the solution to the requirements: [V58-tier3-team-37](https://github.com/chingu-voyages/V58-tier3-team-37)
# This Notebook's Goal
- Prepare the raw data so that it's usable for visualizations set by the requirements
	- become familar with the data
	- remove illogical data
	- remove duplicate data
	- separate or normalize the data into logical tables
	- report information on data that was removed
- This notebook contains the steps needed are to clean up the raw data and the reasoning behind them.
# Data Source
The json document was provided on github in the project details here: 
https://github.com/chingu-voyages/voyage-project-chingu-map/blob/main/src/assets/chingu_info.json

# Plan part1
1. Download the data
2. inspect the input format
3. decide on a strategy to flatten the data into tables

In [None]:
import requests
import json
raw_url = 'https://raw.githubusercontent.com/chingu-voyages/voyage-project-chingu-map/main/src/assets/chingu_info.json'
response = requests.get(raw_url)
if response.status_code == 200:
    chingu_info = response.json()
    print(json.dumps(chingu_info[:50], indent=2))
else:
    print(f"Failed to fetch JSON: {response.status_code}")


[
  {
    "Timestamp": "2025-11-05 13:33",
    "Gender": "MALE",
    "Country Code": "NZ",
    "Timezone": "",
    "Goal": "ACCELERATE LEARNING",
    "Goal-Other": "",
    "Source": "PERSONAL NETWORK",
    "Source-Other": "",
    "Country name (from Country)": "New Zealand",
    "Solo Project Tier": "",
    "Role Type": "Web",
    "Voyage Role": "Developer",
    "Voyage (from Voyage Signups)": "",
    "Voyage Tier": ""
  },
  {
    "Timestamp": "2025-11-05 06:52",
    "Gender": "FEMALE",
    "Country Code": "IN",
    "Timezone": "",
    "Goal": "GAIN EXPERIENCE",
    "Goal-Other": "",
    "Source": "GOOGLE SEARCH",
    "Source-Other": "",
    "Country name (from Country)": "India",
    "Solo Project Tier": "",
    "Role Type": "Python",
    "Voyage Role": "Developer",
    "Voyage (from Voyage Signups)": "",
    "Voyage Tier": ""
  },
  {
    "Timestamp": "2025-11-04 09:14",
    "Gender": "MALE",
    "Country Code": "GE",
    "Timezone": "",
    "Goal": "NETWORK WITH SHARED GOALS",
    

# 1st impressions
This appears to be a standard users dataset.
There are some special characters in column names that need to be removed and some types that need validation.
I'll output some statistics and a list of unique values in each column to get a better feel for the data.

In [None]:
import pandas as pd

In [None]:
df = pd.read_json('https://raw.githubusercontent.com/chingu-voyages/voyage-project-chingu-map/main/src/assets/chingu_info.json')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8970 entries, 0 to 8969
Data columns (total 14 columns):
 #   Column                        Non-Null Count  Dtype         
---  ------                        --------------  -----         
 0   Timestamp                     8967 non-null   datetime64[ns]
 1   Gender                        8970 non-null   object        
 2   Country Code                  8970 non-null   object        
 3   Timezone                      8970 non-null   object        
 4   Goal                          8970 non-null   object        
 5   Goal-Other                    8970 non-null   object        
 6   Source                        8970 non-null   object        
 7   Source-Other                  8970 non-null   object        
 8   Country name (from Country)   8970 non-null   object        
 9   Solo Project Tier             8970 non-null   object        
 10  Role Type                     8970 non-null   object        
 11  Voyage Role                   

In [None]:
# display a short list of the unique values in each column
for column in df.columns:
    unique_values = df[column].unique()
    print(f"Column: {column}")
    print(f"Unique values ({len(unique_values)}): {unique_values[:10]}")

Column: Timestamp
Unique values (8929): <DatetimeArray>
['2025-11-05 13:33:00', '2025-11-05 06:52:00', '2025-11-04 09:14:00',
 '2025-11-03 10:40:00', '2025-11-02 08:51:00', '2025-11-01 07:31:00',
 '2025-10-31 14:09:00', '2025-10-31 14:05:00', '2025-10-31 13:34:00',
 '2025-10-31 12:56:00']
Length: 10, dtype: datetime64[ns]
Column: Gender
Unique values (6): ['MALE' 'FEMALE' 'PREFER NOT TO SAY' 'NON-BINARY' 'TRANS' '']
Column: Country Code
Unique values (178): ['NZ' 'IN' 'GE' 'KE' 'IR' 'GB' 'BR' 'CR' 'GH' 'KR']
Column: Timezone
Unique values (34): ['' 'GMT-5 (New York)' 'GMT−5' 'GMT−8' 'GMT+1' 'GMT-5' 'GMT-8' 'GMT+3'
 'GMT+5' 'GMT+10']
Column: Goal
Unique values (6): ['ACCELERATE LEARNING' 'GAIN EXPERIENCE' 'NETWORK WITH SHARED GOALS'
 'GET OUT OF TUTORIAL PURGATORY' 'OTHER' '']
Column: Goal-Other
Unique values (1018): ['' 'Gain real experience and also networking'
 'I like code with people and gain experience'
 'Gain Experience, Collaborating with Team'
 'i want a chance to excercise my 


# Inferred Column Descriptions

| Column | Type | Description |
|---|---|---|
| **Timestamp** | datetime | Signup date for the member submission time (format "YYYY-MM-DD HH:MM") |
| **Gender** | categorical | Gender the chingu selected in their signup form. 'MALE' 'FEMALE' 'PREFER NOT TO SAY' 'NON-BINARY' 'TRANS' |
| **Country Code** | ISO alpha-2-ish | Short country codes for a chingu's country of origin (e.g. US, IN, GB) |
| **Timezone** | categorical | Timezone for which the chingu resides. ("GMT-4", "GMT-5". Input seems to be messy with location sometimes included |
| **Goal** | categorical | Standardized responses for a user wants out of Chingu (GAIN EXPERIENCE, ACCELERATE LEARNING, ...) |
| **Goal-Other** | free text | Supplied elaboration when Goal = OTHER. |
| **Source** | categorical | Standardized responses for how users found Chingu (PERSONAL NETWORK, GOOGLE SEARCH, etc.). |
| **Source-Other** | free text | Additional source detail when Source = OTHER. |
| **Country name (from Country)** | verbose country name | Redundant with Country Code (e.g., "United States", "India"). |
| **Solo Project Tier** | categorical | Tier of project a chingu completed and associated description with that tier |
| **Role Type** | categorical | Type of developer a chingu member is |
| **Voyage Role** | categorical | Role the chingu takes in a Voyage: Developer, Scrum Master, Product Owner, UI/UX Designer. |
| **Voyage (from Voyage Signups)** | semi-structured | List of voyages a chingu has participated in |
| **Voyage Tier** | semi-structured | List of skill levels for the team assigned to in the voyage signup. |
