## Downloading the Libraries that are required: 

We have already installed all the packages that we need to use in this assignment for the previous one except of seaborn. We install it by running the command `pip install seaborn` in cmd

## We import the libraries needed for the assignment

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# pip install xlrd

## 1. Exploration of Historical Trends

The first thing we have to do is read the source.xlsx file and save the dataframe using pandas.

In [None]:
df = pd.read_excel('source.xlsx')

Now, since we need to calculate the non violent and violent occurencies per decade, we will have to create a new column, decade, which is going to be calculated by the end-year. After doing that, we will also group by the data by decade, save that and keep only the columns neccecary to new dataframe, for simplicity reasons.

In [None]:
df['DECADE'] = (df['EYEAR'] // 10) * 10

df_grouped = df.groupby(['DECADE', 'VIOL']).size().unstack(fill_value=0).reset_index()
df_grouped.columns = ['DECADE','NON VIOLENT','VIOLENT']


Now the last thing we need to do is save the columns we need to variables (optional, but doing it for better readability) and with these variables we create the stackplot. For this plot, we use the `sns.set(style="whitegrid")` just so we can have the white lines in the plot and match exactly the example given to us. Also, we add the legend to the upper left and we set the xlim for the exact same reason (matching the example)

In [None]:

# Creating a stack plot

decades = df_grouped['DECADE']
nonviolent = df_grouped['NON VIOLENT']
violent = df_grouped['VIOLENT']

sns.set(style="whitegrid")
plt.stackplot(decades, violent, nonviolent, labels=['Violent', 'Nonviolent'])
plt.grid(True)
# Adding labels and title
plt.xlabel('decade')
plt.ylabel('campaigns')
plt.legend(loc='upper left')

plt.xlim(min(decades), max(decades))

# Display the plot
plt.show()

Now for the second plot since we want to calculate the percentage of success of non violent campaigns per decade and their number, the first thing we need to do is separate the data accordingly.

First of all, we want to get the campaigns that were successful and non violent per decade, we do that below by getting these campaigns and we are grouping them by their decade so we can get the amount and save it to a new dataframe called `success_by_decade`.

In [None]:
success_by_decade = df[(df['SUCCESS'] == 1) & (df['VIOL'] == 0)]
success_by_decade = success_by_decade.groupby('DECADE').size().reset_index(name='success_count')

After that, we get all the non violent campaigns and calculate the amount per decade so we can use them later to calculate the percentages. We save this dataframe as `total_by_decade`

In [None]:
total_by_decade = df[(df['VIOL'] == 0)]
total_by_decade = total_by_decade.groupby('DECADE').size().reset_index(name='total_count')

Now all we need to do is merge these 2 dataframes we just created into the new merged dataframe.After doing that, all we have to do is get the percentages by diving the amount of successful non violent campaigns with the total amount of non violent campaigns. 

In [None]:
merged_df = pd.merge(total_by_decade, success_by_decade, on='DECADE', how='left').fillna(0)
merged_df['success_percentage'] = (merged_df['success_count'] / merged_df['total_count'])

And now we just create the plot.

In [None]:

width = 2.5  # the width of the bars
x = merged_df['DECADE'] # SETTING THE X AXIS AS THE DECADES
y1 = merged_df['total_count'] # SETTING THE Y1 AXIS AS THE TOTAL COUNT
y2 = merged_df['success_percentage'] # SETTING THE Y2 AXIS AS THE SUCCESS PERCENTAGE

fig = plt.figure(figsize=[12,6]) # SETTING THE FIGURE SIZE
ax1 = fig.add_subplot(111) # SETTING THE SUBPLOT
ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis

ax1.bar(x - width/2, y1, width, color='b', label='number of nonviolent campaigns') # PLOTTING THE BAR CHART 1
ax1.grid(False) # REMOVING THE GRID
ax1.set_ylabel('campaigns') # SETTING THE Y1 LABEL
ax2.set_ylabel('percentage') # SETTING THE Y2 LABEL
ax2.bar(x + width/2, y2, width, color='orange', label='percentage success') # PLOTTING THE BAR CHART 2
ax2.grid(False) # REMOVING THE GRID

fig.legend(bbox_to_anchor=(0.12, 0.88), loc='upper left', fontsize=12) # SETTING THE LEGEND

plt.show()

For the first part of the 3rd plot we do exactly as above but despite of the non violent, this time we are doing it for the violent campaigns.

In [None]:
success_by_decade2 = df[(df['SUCCESS'] == 1) & (df['VIOL'] == 1)]
success_by_decade2 = success_by_decade2.groupby('DECADE').size().reset_index(name='success_count')

total_by_decade2 = df[(df['VIOL'] == 1)]
total_by_decade2 = total_by_decade2.groupby('DECADE').size().reset_index(name='total_count')

merged_df2 = pd.merge(total_by_decade2, success_by_decade2, on='DECADE', how='left').fillna(0)
merged_df2['success_percentage'] = (merged_df2['success_count'] / merged_df2['total_count'])

Now we only need to plot the bar by setting the y1 and y2 axis as the percentages of the non violent and violent campaigns and the x axis the decades.

In [None]:
width = 2.5  # the width of the bars
fig = plt.figure(figsize=[15,8]) # SETTING THE FIGURE SIZE
x = merged_df2['DECADE'] # SETTING THE X AXIS AS THE DECADES
y1 = merged_df2['success_percentage'] # SETTING THE Y1 AXIS AS THE SUCCESS PERCENTAGE OF VIOLENT CAMPAIGNS
y2 = merged_df['success_percentage'] # SETTING THE Y2 AXIS AS THE SUCCESS PERCENTAGE OF NONVIOLENT CAMPAIGNS
ax = fig.add_subplot(111) # SETTING THE SUBPLOT
b2 = ax.bar(x + width/2, y1, width, color='orangered') # PLOTTING THE BAR CHART 1
b1 = ax.bar(x - width/2, y2, width, color='b') # PLOTTING THE BAR CHART 2
plt.xlabel('decade') # SETTING THE X LABEL
plt.ylabel('success rate') # SETTING THE Y LABEL
ax.legend((b1[0], b2[0]), ('nonviolent', 'violent'), fontsize=12) # SETTING THE LEGEND

plt.show()

## 2. Largest Resistance Campaigns, 1946-2014

First of all we need to filter the campagins whee the population is at least 2%, we do that and save it to a new dataframe called `filtered_df`

In [None]:
# Filter campaigns with participation percentage at least 2.0%
filtered_df = df[df['PERCENTAGE POPULAR PARTICIPATION'] >= 0.02]

Now all we need to do before creating the plot is sorting the dataframe by the percentage of participation in descending order and multiplying the percentage column with 100 so we can get it to the right format to match the example numbers in the plot given.

In [None]:
# Sort the DataFrame by participation percentage in descending order
sorted_df = filtered_df.sort_values(by='PERCENTAGE POPULAR PARTICIPATION', ascending=False)
sorted_df['PERCENTAGE POPULAR PARTICIPATION'] = sorted_df['PERCENTAGE POPULAR PARTICIPATION'] * 100.0

Now all we need to do is create the plot.

 * We create the barplot by setting our x axis to the percentage and the y axis to the location and target as one string (to match the example) and set its color as black. (stock color of bar)
 
 * In the for loop we set the text for the numbers showing in each bar.We also set the color of each bar based of if they were limited, successful or unsuccessful.

 * Then we just set the labels, use the gca so we can manipulate the percentages in the x-axis. (0.0%, 10.0%, etc.)

 * and finally, we create a custom legend for the plot.


In [None]:
plt.figure(figsize=(12, 10)) # SETTING THE FIGURE SIZE

ax = sns.barplot(x=sorted_df['PERCENTAGE POPULAR PARTICIPATION'], y=sorted_df['LOCATION']+': '+sorted_df['TARGET'], data=sorted_df, color='black') # PLOTTING THE BAR CHART

for bar, values,success,limited in zip(ax.patches, sorted_df['PERCENTAGE POPULAR PARTICIPATION'],sorted_df['SUCCESS'],sorted_df['LIMITED']): # SETTING THE TEXT ON THE BAR CHART AND COLORING THE BARS
    text_x = bar.get_width() # SETTING THE TEXT X AXIS
    text_y = bar.get_y() + bar.get_height() / 2 # SETTING THE TEXT Y AXIS
    text = '{:.2f}'.format(values) # SETTING THE TEXT
    ax.text(text_x, text_y, text, ha='right', va='center', fontsize=10,color='white') # SETTING THE TEXT PARAMETERS
    if  limited == 1: 
        bar.set_color('grey') # SETTING THE COLOR OF THE BAR IF THE CAMPAIGN ACHIEVED MAJOR CONCESSIONS SHORT OF FULL SUCCESS
    elif success == 0:
        bar.set_color('red') # SETTING THE COLOR OF THE BAR IF THE CAMPAIGN FAILED
    else:
        bar.set_color('black')  # SETTING THE COLOR OF THE BAR IF THE CAMPAIGN SUCCEEDED

plt.xlabel('Percent population participating in peak event') # SETTING THE X LABEL
plt.ylabel('Campaign') # SETTING THE Y LABEL
plt.xlim(0, 50) # SETTING THE X LIMIT
plt.gca().xaxis.set_major_formatter(plt.FuncFormatter('{:.1f}%'.format)) # SETTING THE X AXIS FORMAT

# Create a custom legend
legend_labels = ['Campaign failed','Campaign achieved major concessions short of full success','Campaign succeeded'] # SETTING THE LEGEND LABELS
legend_colors = ['red','grey','black'] # SETTING THE LEGEND COLORS
legend_elements = [plt.Rectangle((0, 0), 1, 1, color=color, linewidth=1) for color in legend_colors] # SETTING THE LEGEND ELEMENTS
plt.legend(legend_elements, legend_labels, title='', loc='lower right', fontsize=12) # SETTING THE LEGEND PARAMETERS

plt.show() # DISPLAYING THE PLOT

## And now some information about one of the most well-known Greek Resistance Campaigns

The resistance against the military junta in Greece (1967-1974) was a period of intense political and social struggle against the authoritarian regime that had seized power through a coup d'état on April 21, 1967.

#### Background:

The military junta, also known as the Regime of the Colonels, was led by a group of military officers headed by Colonel George Papadopoulos. The junta suspended civil liberties, dissolved political parties, censored the media, and established a repressive regime that suppressed dissent.

<img src='R.jpg' height = 200px>

#### Opposition Movements:

The resistance to the junta took various forms and involved a wide range of groups and individuals. Opposition was not limited to a single faction but included students, intellectuals, left-wing activists, and even some conservative elements that opposed the military rule.

One of the most significant events of the resistance occurred in November 1973 at the Athens Polytechnic University. Students staged a massive protest against the junta, demanding an end to the dictatorship and the restoration of democracy. The junta responded with a brutal crackdown, sending in the military to suppress the uprising. The exact number of casualties remains disputed, but the event marked a turning point and increased opposition to the regime both domestically and internationally.

<img src='novmb1704.jpg'>

#### End of the Junta (1974):

The junta's rule came to an end in July 1974 following the Cyprus conflict. A failed attempt by the junta to overthrow the Archbishop Makarios III, the leader of Cyprus, led to the Turkish invasion of Cyprus. This event weakened the junta's position, and civilian unrest in Greece escalated. On July 23, 1974, the junta collapsed, leading to the restoration of democracy in Greece.

#### Legacy:

The resistance against the military junta is remembered as a pivotal moment in Greek history. The sacrifices made by those who opposed the dictatorship are commemorated annually, especially the events at the Athens Polytechnic University. The struggle against the junta left a lasting impact on Greek society and contributed to a renewed commitment to democratic values and human rights.

The period of the military junta and its subsequent downfall shaped the political landscape of Greece in the years that followed, with a lasting impact on the country's commitment to democracy and political pluralism.


## 3. The Effect of Participation on the Probability of Campaign Success

In [None]:
from statsmodels.formula.api import logit

scatter_df = df[['SUCCESS', 'PARTICIPATION', 'PERCENTAGE POPULAR PARTICIPATION']].copy()

scatter_df['POPULATION'] = scatter_df['PARTICIPATION'] / scatter_df['PERCENTAGE POPULAR PARTICIPATION'] # calculate population

scatter_df['LOGPOP'] = np.log(scatter_df['POPULATION']) # log population

scatter_df['LOGPART'] = np.log(scatter_df['PARTICIPATION']) # log participation number

scatter_df['MEMPC'] = scatter_df['LOGPART'] / scatter_df['LOGPOP'] # log of these two variables

#count_success = len(data[data['SUCCESS'] == 1])
#count_fail = len(data[data['SUCCESS'] == 0])

#perc_fail = count_fail/(count_fail+count_success)
#print("percentage of fail is", round(perc_fail*100,2))
#perc_success = count_success/(count_fail+count_success)
#print("percentage of success is", round(perc_success*100,2))
#data['SUCCESS'].value_counts()

#sns.countplot(x='SUCCESS',data=data, palette='hls',hue='SUCCESS')

# check for missing values

#missing_values = data.isnull().sum()
#display(missing_values)

# dropping the null values

scatter_df = scatter_df.dropna(how='any',axis=0)

#missing_values = data.isnull().sum()
#display(missing_values)

scatter_linear_df = scatter_df.copy()
#convert mempc to linear scale
scatter_linear_df['MEMPC'] = 10**scatter_df['MEMPC']

linear_success = logit("SUCCESS ~ MEMPC + LOGPOP", scatter_linear_df).fit()
linear_success.summary2()

X = scatter_df[['MEMPC', 'LOGPOP']]
y_prob = 1 / (1 + np.exp(-linear_success.fittedvalues))  # Using 'result' from the fitted logistic regression model
x = X.iloc[:, 0]

plt.figure(figsize=[12, 8])
plt.scatter(x, y_prob)
plt.ylabel('Probability of Success', fontsize=12)
plt.xlabel('Participants per Capita', fontsize=12)

plt.show()



In [None]:
# Assuming 'linear_clear_data' is the DataFrame after cleaning and transforming variables

# Create a new logistic regression model with only MEMPC as a predictor
scatter_linear_df2 = scatter_df.copy()
scatter_linear_df2['MEMPC'] = scatter_df['MEMPC']  

linear_success_mempc = logit("SUCCESS ~ MEMPC", scatter_linear_df2).fit()

# Display the summary of the logistic regression model
linear_success_mempc.summary2()

# Predict the probabilities
X_mempc = scatter_linear_df2[['MEMPC']]
y_prob_mempc = 1 / (1 + np.exp(-linear_success_mempc.fittedvalues))

# Create the new scatter plot
plt.figure(figsize=[12, 8])
plt.scatter(X_mempc, y_prob_mempc)
plt.ylabel('Probability of Success', fontsize=12)
plt.xlabel('Participants per Capita, logged', fontsize=12)

plt.show()

## 4. The Level of Participation Tipping Point

In [None]:
import pandas as pd

new_df = df[['SUCCESS', 'PARTICIPATION', 'PERCENTAGE POPULAR PARTICIPATION', 'CAMPAIGN']].copy()

# Assuming df is your DataFrame
# You may need to adjust the column names based on your actual data
new_df['PERCENTAGE POPULAR PARTICIPATION'] = pd.to_numeric(new_df['PERCENTAGE POPULAR PARTICIPATION'], errors='coerce')
new_df['PERCENTAGE POPULAR PARTICIPATION'] = new_df['PERCENTAGE POPULAR PARTICIPATION'] * 100.0

# Define the bins for categorization
bins = [0.0001, 0.0035, 0.015, 0.06, 0.25, 1.0, 3.5, float('inf')]
labels = ['less than 0.0035%', '0.0035% - 0.015%', '0.015% - 0.06%', '0.06% - 0.25%', '0.25% - 1.0%', '1.0% - 3.5%', 'greater than 3.5%']

# Categorize the data based on the specified bins
new_df['Peak Popular Participation (%)'] = pd.cut(new_df['PERCENTAGE POPULAR PARTICIPATION'], bins=bins, labels=labels, right=False)

# Group by the Participation Category and calculate the observations and success rate
result = new_df.groupby('Peak Popular Participation (%)').agg({'SUCCESS': 'sum', 'CAMPAIGN': 'count'}).reset_index()

# Handle zero counts to prevent division by zero
result['Success Rate'] = result.apply(lambda row: (row['SUCCESS'] / row['CAMPAIGN']) * 100 if row['CAMPAIGN'] != 0 else 0, axis=1)

# Rename columns and format the output
result = result.rename(columns={'CAMPAIGN': 'Observations'})
result = result[['Peak Popular Participation (%)', 'Observations', 'Success Rate']]
result['Success Rate'] = result['Success Rate'].round(2).astype(str) + '%'

result = result.sort_values(by='Peak Popular Participation (%)', ascending=False).reset_index(drop=True)

# Display the result
display(result)

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from statsmodels.formula.api import logit

# Assuming you have already defined scatter_df and logit function

scatter_linear_df2 = scatter_df.copy()
scatter_linear_df2['MEMPC'] = scatter_df['MEMPC']

linear_success_mempc = logit("SUCCESS ~ MEMPC", scatter_linear_df2).fit()

# Predict the probabilities
X = scatter_linear_df2[['MEMPC']]
y_prob = 1 / (1 + np.exp(-linear_success_mempc.fittedvalues))

# Combine X, y_prob into a DataFrame for sorting
sorted_data = pd.DataFrame({'X': X['MEMPC'], 'y_prob': y_prob})
sorted_data = sorted_data.sort_values(by='X')

plt.figure(figsize=[12, 8])

# Use sns.lineplot to plot the line
sns.lineplot(x=sorted_data['X'], y=sorted_data['y_prob'], label='Line of Probability',linewidth=3)

# Add a shaded confidence interval
ci = 0.1 # Assuming a constant confidence interval for example
plt.fill_between(sorted_data['X'], (sorted_data['y_prob'] - ci), (sorted_data['y_prob'] + ci), color='b', alpha=.1)

plt.ylabel('Probability of success', fontsize=18)
plt.xlabel('Participants per capita, logged', fontsize=18)
plt.axhline(y=0.99, color='red', linestyle='--', label='Threshold')

plt.legend()
plt.show()



## 5. Nonviolent Resistance and Target Regime Type

In [None]:
import pandas as pd

p5_df = pd.read_excel('p5v2018.xls')

# Convert 'EndYear' to integer for matching
df['EYEAR'] = df['EYEAR']
p5_df['year'] = p5_df['year']

# Merge based on 'EndYear' and 'Country'
merged_df = pd.merge(df, p5_df, how='inner', left_on=['EYEAR', 'LOCATION'], right_on=['year', 'country'])

# Drop the duplicate 'Year' column if needed
merged_df = merged_df.drop(columns=['year'])

merged_df['POPULATION'] = merged_df['PARTICIPATION'] / merged_df['PERCENTAGE POPULAR PARTICIPATION'] # calculate population

merged_df['LOGPOP'] = np.log(merged_df['POPULATION']) # log population

merged_df['LOGPART'] = np.log(merged_df['PARTICIPATION']) # log participation number

filtered_df = merged_df[['SUCCESS','NONVIOL','polity','LOGPART','LOGPOP']]

#percent_missing = filtered_df.isnull().sum() * 100 / len(filtered_df)
#percent_missing

filtered_df = filtered_df.dropna(how='any',axis=0)

#percent_missing = filtered_df.isnull().sum() * 100 / len(filtered_df)
#percent_missing

log_df = logit("SUCCESS ~ NONVIOL + polity + LOGPART + LOGPOP",filtered_df).fit()
log_df.summary2()

