We sought to analyze the relationship between phone addictions and social anxiety, using the aggregate phone addiction and social anxiety scores from the data frame. The producers of the data asked a series of questions relating to phone addiction and social anxiety, people ranked themselves on a score from 1 to 5, and then response to questions under a certain category (phone addiction and social anxiety respectively) were added up. The plot of these aggregates is below.

In [138]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_csv('/work/phones.csv')
df.head()
df = df.rename(columns={'Mean value of social comfort': 'Mean value of social discomfort'})


print (df.iloc[:, 18])
print (df.iloc[:, 26])
x = df.iloc[:, 18]
y = df.iloc[:, 26]
best_fit = np.polyfit(x, y, 10)
p = np.poly1d(best_fit)
xp = np.linspace(0, 70, 100)
plt.scatter(df.iloc[:, 18], df.iloc[:, 26])
plt.plot(xp, p(xp), label = '10-Degree Best Fit', color = 'green')
plt.xlabel('Personal Phone Addiction Score')
plt.ylabel('Personal Social Anxiety Score')
plt.title('Phone Addiction vs. Social Anxiety: Personal Scores')
plt.xlim(10, 61)
plt.ylim(0, 30)
plt.legend()
plt.show()

# Is digital media alienating or a force to bring people together?

We can see here that even with the 10-degree polynomial of best fit, and a large number of dots on the scatter plot, we can't really deduce any specific information from the plot. So, what if we get more specific? 

In [2]:
correlation_agg = x.corr(y)
print(f"Aggregate Score Correlation: {correlation_agg}")

This starts our exploration of the correlation function in python. Correlation coefficients range from -1 to 1: closer to -1 implies strong negative correlation, closer to 0 implies no correlation, and closer to 1 implies strong positive correlation.

Now, what if we look at correlations between similar questions in the data frame? Can we deduce something more specific about social anxiety and phone addictions? 

In [72]:
detachment_score = df.iloc[:, 17]
club_time = df.iloc[:, 30]
nerve_score = df.iloc[:, 19]
phone_loneliness = df.iloc[:, 11]
shy_score = df.iloc[:, 24]
social_comfort = df.iloc[:, 56]
withdrawal_avg = df.iloc[:, 46]
club_avg = df.iloc[:, 51]


correlation_1 = detachment_score.corr(club_time)
print(correlation_1)
# extremely weak positive/virtually nonexistent
correlation_2 = detachment_score.corr(nerve_score)
print(correlation_2)
# pretty weak positive
correlation_3 = phone_loneliness.corr(club_time)
print(correlation_3)
# extremely weak negative/virtually nonexistent
correlation_4 = detachment_score.corr(shy_score)
print(correlation_4)
correlation_5 = detachment_score.corr(y)
print(correlation_5)
# weak positive correlation
correlation_6 = x.corr(club_time) 
print(correlation_6)
# virtually no correlation
correlation_7 = social_comfort.corr(club_time)
print(correlation_7)
# very weak negative correlation
correlation_8 = withdrawal_avg.corr(club_avg)
print(correlation_8)
# weak positive

correlation_labels = ["Detachment & club time", "Detachment & nerves", "Loneliness & club time", 
"Detachment & shyness", "Detachment &  social anxiety", "Phone addiction & club time", 
"Social comfort & club time", "Withdrawal & Club Participation"]
correlation_list = [correlation_1, correlation_2, correlation_3, correlation_4, 
correlation_5, correlation_6, correlation_7, correlation_8]
colors = ['green' if x > 0.05 else 'blue' if -0.05 < x < 0.05 else 'red' for x in correlation_list]
plt.barh(correlation_labels, correlation_list, color = colors)
plt.title("Phone Use, Anxiety, and Club Time Correlation Tests")
plt.axvline(x=0, color='black', linestyle='-', linewidth=0.8)
plt.subplots_adjust(left=0.3)
plt.xlabel("Correlation Coefficients")
plt.show

After looking at specific correlation values across the data set, we can see that no quality is extremely correlated with another. (Note that values greater than 0 show positive correlation, less than 0 show negative correlation, and close to 0 show weak to no correlation). But, here we were looking for a needle in a haystack: two specific variables that correlate with each other by just checking random pairs of variables. What if we used a function to get more specific?

In [141]:
subset_df = df.iloc[:, 46:52]
# This limits the part of the dataframe we are using. 
# These columns are mean values of certain qualities 
# (mood changes, club participation, phone addiction levels, etc.)


corr_matrix = subset_df.corr()
corr_pairs = corr_matrix.unstack()
corr_pairs = corr_pairs[corr_pairs < 1]
corr_pairs = corr_pairs.sort_values(ascending=False).drop_duplicates()

# print(corr_pairs)
corr_df = corr_pairs.reset_index()
print(corr_df)
corr_df.columns = ['Variable_1', 'Variable_2', 'Correlation']
corr_df['Pair'] = corr_df['Variable_1'] + '&' + corr_df['Variable_2']
corr_df['Pair_Shorten'] = (corr_df['Variable_1'].str.replace('Mean value of ', '') + 
                         ' & ' + 
                         corr_df['Variable_2'].str.replace('Mean value of ', ''))
corr_df['Pair_Shorten'] = corr_df['Pair_Shorten'].str.title()
plt.figure(figsize = (12,6))
colors_2 = ["green" if x > 0.3 else "blue" if 0.3 > x > 0 else "red" for x in corr_df['Correlation']]
plt.barh(corr_df['Pair_Shorten'], corr_df['Correlation'], color = colors_2)
plt.axvline(x=0, color='black', linestyle='-', linewidth=0.8)
plt.xlabel("Correlation Coefficient")
plt.title("Correlation Among Means")
# plt.tight_layout()
for i, v in enumerate(corr_df['Correlation']):
    plt.text(v, i, f' {v:.3f}', va='center', fontsize=9)
plt.show()

Now, using a more precise method of finding correlations among variables, we can see which traits are highly correlated with each other. Note that "Prominent Behavior" reflects observable phone usage actions like feeling more fulfilled because of phone calls or having hallucinations that one's phone is vibrating or ringing. Some of the results are pretty intuitive: prominent behavior and withdrawal symptoms are likely very linked since people who have hallucinations that their phones are ringing are also likely the ones who spend a lot of time on their phones and have withdrawals without them. Social discomfort and social anxiety have a similar relationship. However, one notable correlation value is 0.538 for prominent behavior and mood changes, which suggests that someone feeling more confident on the phone (or less confident without it) may experience greater variations in emotions. Similarly, prominent behavior and social discomfort have a correlation value of 0.43, suggesting that someone who gains so much from the phone may overall feel less comfortable in social situations. (Other takeaways can be made/the writing would probably be better from someone else. Also, if someone wants to make a bottom line takeaway they can. )

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=486ae32c-0d30-41cc-8c1b-9438f632e7d2' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>