# Programming Contest Nationality Analysis
This notebook analyzes contest placement data from an extended synthetic dataset of 100 contests. Each contest lists the top 10 competitors along with their countries. The data approximates a representative sample so that statistical methods can be demonstrated without internet access.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("extended_contests_v2.csv")
df.head()

### Methodology
The dataset now spans 100 contests and includes over 30 countries. Wins (rank=1), top 5, and top 10 placements are tallied for each nationality. A chi-square test is used to check if the distribution of wins differs by country.

## Win Counts by Nationality

In [None]:
top1 = df[df['rank'] == 1]['country'].value_counts()
top5 = df[df['rank'] <= 5]['country'].value_counts()
top10 = df[df['rank'] <= 10]['country'].value_counts()

summary = pd.DataFrame({'Wins': top1, 'Top 5': top5, 'Top 10': top10}).fillna(0).astype(int)
summary

## Visualizations

In [None]:
plt.figure(figsize=(8,4))
sns.barplot(x=top1.index, y=top1.values, palette='viridis')
plt.title('Contest Wins by Country')
plt.xlabel('Country')
plt.ylabel('Number of Wins')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
plt.figure(figsize=(8,4))
sns.barplot(x=top5.index, y=top5.values, palette='magma')
plt.title('Top 5 Placements by Country')
plt.xlabel('Country')
plt.ylabel('Count in Top 5')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## Polish Results vs Peers
Poland's contest success is compared with similar GDP countries such as Portugal, Czech Republic, and Hungary.

In [None]:
poland_wins = top1.get('Poland', 0)
total_wins = top1.sum()
poland_share = poland_wins / total_wins if total_wins else 0

pd.DataFrame({'Polish Wins': [poland_wins], 'Total Wins': [total_wins], 'Poland Share': [poland_share]})

## Statistical Test\nA chi-square-like statistic is computed using win counts to gauge if results differ from an even distribution.

In [None]:
import numpy as np\nexpected = np.full_like(top1.values, top1.mean())\nchi_square = ((top1.values - expected)**2 / expected).sum()\nchi_square