# Descriptive analytics donor info

This notebook contains descriptive analytics about the donor info csv file. 

The combined data file contains the scores on the personality questionnaires and demographical data. 

Sum (combined scores from time point 4 until time point 7 on the BDRI questionnaire / vasovagal reactions)
sum2:

0. low VVR
1. high VVR (split on the mean value of the sum of BDRI scores)

In [2]:
# import 
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Load csv file 

In [3]:
donor_info_path = "/Users/dionnespaltman/Desktop/downloading/20221101_donor_info2.csv"

df = pd.read_csv(donor_info_path, index_col=0)
df = df.rename(columns={'sum': 'Sum'})
df = df.rename(columns={'binary': 'Binary'})

df_with_id = pd.read_csv(donor_info_path)

df.head()

Unnamed: 0_level_0,Time_point,Path,Sum,Binary,Gender,Location,Condition
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
5,1,video5_1.MOV,12,0,2,0,1
5,1,video5_1.MOV,12,0,2,0,1
5,1,video5_1.MOV,12,0,2,0,1
5,1,video5_1.MOV,12,0,2,0,1
5,1,video5_1.MOV,12,0,2,0,1


In [4]:
# Assuming your DataFrame is named df
num_columns = df.shape[1]
print("Number of columns in the DataFrame:", num_columns)


Number of columns in the DataFrame: 7


In [5]:
display(df)

Unnamed: 0_level_0,Time_point,Path,Sum,Binary,Gender,Location,Condition
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
5,1,video5_1.MOV,12,0,2,0,1
5,1,video5_1.MOV,12,0,2,0,1
5,1,video5_1.MOV,12,0,2,0,1
5,1,video5_1.MOV,12,0,2,0,1
5,1,video5_1.MOV,12,0,2,0,1
...,...,...,...,...,...,...,...
291,2,video291_2.MOV,10,1,1,0,3
291,4,video291_4.MOV,12,0,1,0,3
291,5,video291_5.MOV,8,1,1,0,3
291,6,video291_6.MOV,10,1,1,0,3


In [6]:
print(f"Number of categories in the ID column is {df_with_id.ID.nunique()}.")
print(f"Number of categories in the time point column is {df.Time_point.nunique()}.")
print(f"Number of categories in the path column is {df.Path.nunique()}.")
print(f"Number of categories in the sum column is {df.Sum.nunique()}.")
print(f"Number of categories in the binary column is {df.Binary.nunique()}.")
print(f"Number of categories in the gender column is {df.Gender.nunique()}.")
print(f"Number of categories in the location column is {df.Location.nunique()}.")
print(f"Number of categories in the condition column is {df.Condition.nunique()}.")

Number of categories in the ID column is 281.
Number of categories in the time point column is 7.
Number of categories in the path column is 2792.
Number of categories in the sum column is 21.
Number of categories in the binary column is 2.
Number of categories in the gender column is 2.
Number of categories in the location column is 3.
Number of categories in the condition column is 3.


# Experimental condition

1. control (experienced donors who did not have any vasovagal reactions previously)
2. experimental/sensitive (experienced donors who had vasovagal reaction at the previous donation)
3. first-time donors

In [7]:
condition_values = df['Condition'].unique()

print(condition_values)

[1 2 3]


# Unique IDs

We have 281 unique participants. 

In [8]:
# Assuming your DataFrame is named df
unique_ids = df.index.nunique()
print("Number of unique IDs:", unique_ids)


Number of unique IDs: 281


# Gender 

We have 119 in the 1st condition and 162 in the second condition. 

1. Male
2. Female      

In [9]:
# Assuming your DataFrame is named df
gender_counts = df.reset_index().drop_duplicates(subset='ID').groupby('Gender').size()
print(gender_counts)

Gender
1    119
2    162
dtype: int64


# Location - how can I get only 0, 1, 2 for location values instead of 0, 1, 2, 3? 

(Where the data was collected from)

0. Den Bosch
1. Leiden
2. Zwolle
3. Utrecht-Uithof    

In [10]:
location_values = df['Location'].unique()

print(location_values)

[0 1 2]


# Sum 

In [9]:
sum_values = df['Sum'].unique()

print(sum_values)

[12 10  9  8 11 14 40 18 16 15 13 20 19 17 21 25 26 23 22 24 30]


In [11]:
# sum2_values = df['Sum2'].unique()

# print(sum2_values)

# And more 

In [11]:
display(df)

Unnamed: 0_level_0,Time_point,Path,Sum,Binary,Gender,Location,Condition
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
5,1,video5_1.MOV,12,0,2,0,1
5,1,video5_1.MOV,12,0,2,0,1
5,1,video5_1.MOV,12,0,2,0,1
5,1,video5_1.MOV,12,0,2,0,1
5,1,video5_1.MOV,12,0,2,0,1
...,...,...,...,...,...,...,...
291,2,video291_2.MOV,10,1,1,0,3
291,4,video291_4.MOV,12,0,1,0,3
291,5,video291_5.MOV,8,1,1,0,3
291,6,video291_6.MOV,10,1,1,0,3


### Checking if the condition is the same for every ID 

In [12]:
# Group the DataFrame by 'ID' and check if each group has only one unique condition
consistent_conditions = df.groupby('ID')['Condition'].nunique() == 1

# Check if all IDs have consistent conditions
all_consistent = consistent_conditions.all()

if all_consistent:
    print("For every ID, the condition is always the same.")
else:
    print("For some IDs, the condition varies.")

For every ID, the condition is always the same.


### Creating a new file with only unique IDs

In [13]:
# Reset the index to make 'ID' a regular column
df_reset_index = df.reset_index()

# Create a new DataFrame with unique IDs
unique_ids_df = df_reset_index.drop_duplicates(subset='ID')

# Display the new DataFrame
display(unique_ids_df)


Unnamed: 0,ID,Time_point,Path,Sum,Binary,Gender,Location,Condition
0,5,1,video5_1.MOV,12,0,2,0,1
63,6,1,video6_1.MOV,8,1,1,0,2
112,7,1,video7_1.MOV,11,1,2,0,2
196,8,1,video8_1.MOV,9,1,1,0,1
245,9,1,video9_1.MOV,8,1,1,0,1
...,...,...,...,...,...,...,...,...
15065,287,1,video287_1.MOV,17,0,2,0,2
15101,288,1,video288_1.MOV,8,1,2,0,3
15107,289,1,video289_1.MOV,9,1,1,0,2
15113,290,1,video290_1.MOV,8,1,2,0,3


### Getting a list of all unique IDs

In [14]:
# Get a list of all unique IDs
unique_ids = df.index.unique().tolist()

# Display the list of unique IDs
print(unique_ids)


[5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 23

# Creating a pickle file of the dataframe (works faster)

In [15]:
# Save big_dataframe as a pickle file
df.to_pickle("/Users/dionnespaltman/Desktop/downloading/donor_info.pkl")

In [16]:
file_path = '/Users/dionnespaltman/Desktop/downloading/donor_info.pkl'

# Read the pickle file into a DataFrame
df = pd.read_pickle(file_path)

# Display the DataFrame
display(df)

Unnamed: 0_level_0,Time_point,Path,Sum,Binary,Gender,Location,Condition
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
5,1,video5_1.MOV,12,0,2,0,1
5,1,video5_1.MOV,12,0,2,0,1
5,1,video5_1.MOV,12,0,2,0,1
5,1,video5_1.MOV,12,0,2,0,1
5,1,video5_1.MOV,12,0,2,0,1
...,...,...,...,...,...,...,...
291,2,video291_2.MOV,10,1,1,0,3
291,4,video291_4.MOV,12,0,1,0,3
291,5,video291_5.MOV,8,1,1,0,3
291,6,video291_6.MOV,10,1,1,0,3
