# Predicting T-shirt size using the ANSUR II dataset
We will here try to predict a persons t-shirt size given the weight and height of the person. We will use the ANSUR II dataset which contains a lot of information about the physical attributes of a large number of people. 

We will first try to map the persons in the dataset to a t-shirt size. It is hard to find a concise size chart for t-shirt so we will create our own, initial chart, based on these assumptions:

We will only look at two measurements, Shoulder Width and Chest Circumference.

Our first problem is that Shoulder Width is not one of the measurements taken in the dataset. But we have Biacromial Breadth which is the distance between the two acromion processes. We will assume that this is the same as Shoulder Width.

We will then have these initial rules:

| Size | Percentile |
|------|------------|
| XS   | 0-5        |
| S    | 5-25       |
| M    | 25-50      |
| L    | 50-75      |
| XL   | 75-90      |
| XXL  | 90-97      |
| XXXL | 97-100     |

## Inspect the data

In [1]:
import pandas as pd

In [2]:
female = pd.read_csv('../data/female.csv')
male = pd.read_csv('../data/male.csv')

In [3]:
print(f'For women we have (rows, columns) {female.shape}')
print(f'For men we have (rows, columns) {male.shape}')

For women we have (rows, columns) (1986, 108)
For men we have (rows, columns) (4082, 108)


## Checking the percentiles

Let us determine the percentiles of the data.

In [4]:
def compute_percentile_ranges(column):
    # Define percentile ranges
    ranges = [(0, 5), (5, 25), (25, 50), (50, 75), (75, 90), (90, 97), (97, 100)]

    percentiles = {(low, high): (column.quantile(low/100), column.quantile(high/100)) for low, high in ranges}

    counts = {}

    for r, (low, high) in percentiles.items():
        counts[r] = int(((column >= low) & (column < high)).sum())
    
    return counts

# print(compute_percentile_ranges(female['chestcircumference']))
# print(compute_percentile_ranges(female['biacromialbreadth']))

# print(compute_percentile_ranges(male['chestcircumference']))
# print(compute_percentile_ranges(male['biacromialbreadth']))

In [5]:
female_chest_range = pd.DataFrame(list(compute_percentile_ranges(female['chestcircumference']).items()), columns=['Chest Range', 'Chest Count'])
female_shoulder_range = pd.DataFrame(list(compute_percentile_ranges(female['biacromialbreadth']).items()), columns=['Shoulder Range', 'Shoulder Count'])
female_ranges = pd.concat([female_chest_range, female_shoulder_range], axis=1)

female_ranges

Unnamed: 0,Chest Range,Chest Count,Shoulder Range,Shoulder Count
0,"(0, 5)",100,"(0, 5)",93
1,"(5, 25)",396,"(5, 25)",377
2,"(25, 50)",492,"(25, 50)",477
3,"(50, 75)",499,"(50, 75)",541
4,"(75, 90)",299,"(75, 90)",297
5,"(90, 97)",140,"(90, 97)",139
6,"(97, 100)",59,"(97, 100)",61


In [6]:
male_chest_range = pd.DataFrame(list(compute_percentile_ranges(male['chestcircumference']).items()), columns=['Chest Range', 'Chest Count'])
male_shoulder_range = pd.DataFrame(list(compute_percentile_ranges(male['biacromialbreadth']).items()), columns=['Shoulder Range', 'Shoulder Count'])
male_ranges = pd.concat([male_chest_range, male_shoulder_range], axis=1)

male_ranges

Unnamed: 0,Chest Range,Chest Count,Shoulder Range,Shoulder Count
0,"(0, 5)",199,"(0, 5)",191
1,"(5, 25)",810,"(5, 25)",787
2,"(25, 50)",1025,"(25, 50)",989
3,"(50, 75)",1012,"(50, 75)",1079
4,"(75, 90)",616,"(75, 90)",610
5,"(90, 97)",295,"(90, 97)",303
6,"(97, 100)",124,"(97, 100)",122


## Generate the t-shirt size chart

In [7]:
def compute_size_percentile_mesurments(data, chest_column, shoulder_column):
    sizes = ['XS', 'S', 'M', 'L', 'XL', '2XL', '3XL']
    ranges = [0, 5, 25, 50, 75, 90, 97]

    # Compute the values for each percentile for chest and shoulder
    chest_percentiles = {p: data[chest_column].quantile(p/100) for p in ranges}
    shoulder_percentiles = {p: data[shoulder_column].quantile(p/100) for p in ranges}

    # Map the t-shirt sizes to the corresponding chest and shoulder measurments
    size_mappings = {}
    for i, size in enumerate(sizes):
        size_mappings[size] = {
            'Chest': int(chest_percentiles[ranges[i]]),
            'Shoulder': int(shoulder_percentiles[ranges[i]])
        }
    
    return size_mappings

In [8]:
print(compute_size_percentile_mesurments(female, 'chestcircumference', 'biacromialbreadth'))
print(compute_size_percentile_mesurments(male, 'chestcircumference', 'biacromialbreadth'))



{'XS': {'Chest': 695, 'Shoulder': 283}, 'S': {'Chest': 824, 'Shoulder': 335}, 'M': {'Chest': 889, 'Shoulder': 353}, 'L': {'Chest': 940, 'Shoulder': 365}, 'XL': {'Chest': 999, 'Shoulder': 378}, '2XL': {'Chest': 1057, 'Shoulder': 389}, '3XL': {'Chest': 1117, 'Shoulder': 400}}
{'XS': {'Chest': 774, 'Shoulder': 337}, 'S': {'Chest': 922, 'Shoulder': 384}, 'M': {'Chest': 996, 'Shoulder': 403}, 'L': {'Chest': 1056, 'Shoulder': 415}, 'XL': {'Chest': 1117, 'Shoulder': 428}, '2XL': {'Chest': 1172, 'Shoulder': 441}, '3XL': {'Chest': 1233, 'Shoulder': 452}}


In [9]:
# {'XS': {'Chest': 695, 'Shoulder': 283}, 'S': {'Chest': 824, 'Shoulder': 335}, 'M': {'Chest': 889, 'Shoulder': 353}, 'L': {'Chest': 940, 'Shoulder': 365}, 'XL': {'Chest': 999, 'Shoulder': 378}, '2XL': {'Chest': 1057, 'Shoulder': 389}, '3XL': {'Chest': 1117, 'Shoulder': 400}}

female_sizes = {
    'XS': {'Chest': 695, 'Shoulder': 283}, 
    'S': {'Chest': 824, 'Shoulder': 335}, 
    'M': {'Chest': 889, 'Shoulder': 353}, 
    'L': {'Chest': 940, 'Shoulder': 365}, 
    'XL': {'Chest': 999, 'Shoulder': 378}, 
    '2XL': {'Chest': 1057, 'Shoulder': 389}, 
    '3XL': {'Chest': 1117, 'Shoulder': 400}
    }

male_sizes = {
    'XS': {'Chest': 774, 'Shoulder': 337}, 
    'S': {'Chest': 922, 'Shoulder': 384}, 
    'M': {'Chest': 996, 'Shoulder': 403}, 
    'L': {'Chest': 1056, 'Shoulder': 415}, 
    'XL': {'Chest': 1117, 'Shoulder': 428}, 
    '2XL': {'Chest': 1172, 'Shoulder': 441}, 
    '3XL': {'Chest': 1233, 'Shoulder': 452}
    }

# TODAYS LAB

Earlier in the project, we mentioned that there might be conflicts when comparing sizes based on different measurements (e.g., chest circumference and shoulder breadth). For instance, a person might have size S for chest but size M for shoulders. Your task is to get a clearer picture of how many individuals have matching sizes for both measurements and how many have different sizes (i.e., they fall into different sizes for shoulder breadth and chest circumference).
 
Use the size chart: Use a size chart that specifies the limits for shoulder breadth and chest circumference for each size.
 
Create a function: Write a function that iterates through each person's measurements and compares them with the size chart.
 
Count matches and conflicts: The function should count the number of individuals who have exactly one matching size and the number of individuals who have multiple possible sizes (conflicts).
 
Test your function with both female and male datasets, and use appropriate size charts for each gender.

In [10]:
def get_size(data, size_chart):
    matches = {size: 0 for size in size_chart.keys()}   
    ties = 0

    for _, row in data.iterrows():
        possible_sizes = []

        for size, measurments in size_chart.items():
            if row['chestcircumference'] <= measurments['Chest'] and row['biacromialbreadth'] >= measurments['Shoulder']:
                possible_sizes.append(size)

        if len(possible_sizes) == 1:
            matches[possible_sizes[0]] += 1
        elif len(possible_sizes) > 1:
            ties += 1

    return matches, ties

female_matches, female_ties = get_size(female, female_sizes)
male_matches, male_ties = get_size(male, male_sizes)

female_matches = pd.DataFrame(list(female_matches.items()), columns=['Size', 'Count'])
male_matches = pd.DataFrame(list(male_matches.items()), columns=['Size', 'Count'])

In [11]:
print(f'Female ties {female_ties}')
female_matches


Female ties 390


Unnamed: 0,Size,Count
0,XS,0
1,S,35
2,M,115
3,L,142
4,XL,77
5,2XL,34
6,3XL,10


In [12]:
print(f'Male ties  {male_ties}') # how to make this into a pandas dataframe? 
male_matches

Male ties  625


Unnamed: 0,Size,Count
0,XS,1
1,S,70
2,M,228
3,L,286
4,XL,164
5,2XL,69
6,3XL,38
