**<center> <span style="color:red;font-family:serif; font-size:32px;">Analyzing the Axial Age 🌏</span> </center>**

<img src="https://geopolicraticus.files.wordpress.com/2010/08/axial-cultures-map.jpg">

## 1.  Understanding the Context

The **Axial Age** dataset tracks a variety of sociopolitical norms and their development across **key areas in Afro-Eurasia**. The specific scores for each **sociopolitical norm** for each date (varying time spans between **5300 BCE and 1800 CE** in 100 year increments) within 10 NGAs **(natural geographic area)** were agreed-upon by a group of experts and compiled into the dataset.

1. Simply put, the **Axial Age dataset** contains a lot of sociopolitical beliefs that were prevelant across some of the important    historical areas in Afro-Eurasia.

2. Each norm is assigned a specific score for each date within the 10 mentioned Areas by group of experts.



## 2.  Understanding the term "Axial Age"

* The **"Axial age"** or also called as **"Axis age"** is a period of time when, most part of the inhabited world got into a spur of various sociopolitical reforms around the same time. 


* These norms which were established back then, **subsequently changed the human history**. 


* For example, with the **ancient Greek philosophers**, **Indian metaphysicians and logician**s (who articulated the great traditions of Hinduism, Buddhism, and Jainism), **Persian Zoroastrianism**, **the Hebrew Prophets***, the **“Hundred Schools”** (most notably Confucianism and Daoism) of ancient China. These are only some of the representative Axial traditions that emerged and took    root during that time


* The phrase **"Axial Age"** is coined by German psychiatrist and philosopher **Karl Jaspers**, who noted that during this period there was a shift—or a turn, as if on an axis—away from more predominantly localized concerns and toward transcendence.

## 3. Proceeding on with the Dataset

*The first step is to import the necessary libraries to work with the dataset*

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline


#Importing all the necessary libraries

In [None]:
df = pd.read_csv("../input/axial-age-dataset/AxialAgeDataset.csv")
df.head(10)

#Reading the dataset and viewing the first ten rows for an overview

*Before proceeding with the data analysis, let us first try to understand this dataset*

## 4.  Understanding the data

*Let us try to look at each column and see what that means in context to the dataset*

1. **Date.From** :- This refers to the **date(Year)** from which the data about the particular region is taken into account.


2. **NGA** :- NGA stands for **"Natural Geographic Area"** or the area which is taken into consideration


3. **Moralistic Punishment** :- From this column, all the column names are the **various sociopolitical norms** that were prevelant across these regions and each was given a score from 0 to 1 based on the conditions that were existing at that time.


4. **Moralizing Norms**:- Moral norms are the rules of morality that people ought to follow. An evolutionary explanation of the emergence of moral norms proceeds in stages


5. **Promotion of Prosociality** :- Promotion of Prosocial behavior that **"benefits other people or society as a whole"**, such as helping, sharing, donating, co-operating, and volunteering


6. **Omniscient Supernatural beings** :- Belief in the **beings that exist outside this realm** and are the reasons for supernatural occurances


7. **Rulers not gods** :- Believing in the **rulers and not gods**


8. **Equating elites and commoners** :- Putting the **nobles** and **common folk** and giving them **equal importance**



9. **Formal Legal Code** :- A code of law that **purports to exhaustively cover a complete system of laws** or a particular area of law as it existed at the time the code was enacted, **by a process of codification**.


10. **General Applicability of Law** :-  General applicability is commonly used to **distinguish general laws being applied to regulate speech or religious activities from laws aimed specifically at those activities**.


11. **Constraints on executives**


12. **Full time bureaucrats**


13. **Impeachment** :- Impeachment, in common law, **a proceeding instituted by a legislative body to address serious misconduct by a public official**

*Now,that we have understood the data, let us proceed with the dataset*

In [None]:
print('This dataset has {} rows and {}columns'.format(df.shape[0],df.shape[1]))

*Let us now look at a brief overview of this dataset*

In [None]:
df.info()

### Inferences:-

1. There are a total of **15 columns** in which almost all columns are numeric datatypes **except the NGA column**

2. **428 rows** in total indexing from 0 to 427

3. By taking in cosideration of the value 428, we can clearly see that except 5 columns, the remaining all columns have missing values

*Let us now get a statistical overview of this data*

In [None]:
df.describe()

### Inferences:-

1. The date in this dataset ranges from **5300 BCE to 1800 ACE** with an increment of 100 years each.

2. Contrary to the popular belief of believing in gods and supernatural beings during these times, the percentage who believes is **only 45%** in the entire dataset combined

3. Majority of people believe in their rulers and not gods.

4. There are a **lot of missing values** in this dataset

*Let us now try to understand the point system for each column to see what intermediate values we can have aside from 0 and 1*

In [None]:
for i in df.iloc[:,2:-1].columns:
    print(f'{i} : {df[i].unique()}')

As we can see, **each column has a set of 4 recurring values for scores**. From this we can infer that,

a)  The scoring has only **4 unique scores** which are **0,1 and 0.5 and 0.6**

b)  **0** refers to the **not following the norm** in that region


c)  **1** refers to the **complete belief of the norm** in that region


d)  0.5 and 0.6 refers to the partial following of that norm in that region

In [None]:
df.columns

#Looking at the column names

*Looking at the column names, we can see that although the headings make sense, we can make some adjustments to them like removing the spaces and replacing by underscores and removing those numbers at the beginning*

*Let us now do some string manipulations to remove the numbers and the white spaces between the column names*

**First, inorder to make changes to the columns, we need to take in the column names as a list and conver it into a Pandas Series inorder to apply the function**

In [None]:
cols = pd.Series(df.columns)
cols

#Made the column names as a series

*Now that the column is made as series, we can create a function in such a way that* **"If the column name begins with a number, we omit the number and other unnecessary things before proceeding to the actual column name. Then we can replace blank spaces with underscores "_"** 




*The same has been done below*

In [None]:
cols = cols.apply(lambda x: x[3:].replace(' ','_') if x[0].isdigit()==True else x)
cols[11:14] = cols[11:14].apply(lambda x: x[1:])
cols

#All the column names has been successfully updated

**Now let us assign these new column names to the dataframe to finally finish this process**

In [None]:
df.columns = cols
df

*Now that we have successfully updated these column names, let us proceed with the next step*

## 5.  Handling Missing Values and Modifying the data

**Null values are often present in almost all datasets which needs to be dealt with before proceeding on to the Data Analysis**



In [None]:
df.isnull().sum()

*We can see that majority of columns have **null values** present in them so let us try to understand the data wherein the null values are present*

In [None]:
import random

fig_6 = plt.figure(figsize=(10,5),dpi=100)
axes_6 = fig_6.add_axes([0.1,0.1,0.9,0.9])

# Create a pie chart of the number of Pokemon by type

cols = [i for i in df.columns if df[i].isnull().sum()!=0]

vals = [df[i].isnull().sum() for i in df.columns if df[i].isnull().sum()!=0]



colors = []

for i in range(18):
    colors.append((random.uniform(0,1),random.uniform(0,1),random.uniform(0,1)))

plt.pie(vals,colors=colors,labels=cols,autopct='%.0f%%')

### Inferences:-

1. **"Moralisitc_punishment"** and **"Omniscient_supernatural_beings"** has the **highest percentage** of **null values** 



2. **"Equating_elites_and_commoners"** and **"Equating_rulers_and_commoners"** has the **least amount** of **null values**



3. Three columns have equal percentage of null values so let us see if there are some patterns in this

In [None]:
df[cols].dtypes

*Since the null values are only present in the columns which contains the scores of those particular social norms,*

*We can assume that the null values are infact 0 since the records of those values are not present in this dataset.*

In [None]:
df = df.fillna(0.0)
df.isnull().sum()

**Null values have been taken care of in this dataset so let us proceed with the Exploratory Data Analysis**

## 6.  Exploratory Data Analysis

<img src="https://www.datadecisionsgroup.com/hubfs/img/eda3.jpg">

In [None]:
df.head(15)

In [None]:
plt.figure(figsize=(10,4))

plt.plot(df['NGA'],df['Date.From'],color='navy', alpha=0.75, lw=1,marker='o',ls='dotted',markersize=7,markerfacecolor='r',markeredgecolor='y')
plt.xlabel("NGA")
plt.ylabel("Date")
plt.title("NGAs vs Time Periods")
plt.tight_layout()
plt.xticks(rotation=68)

When we compare the regions of the data aka **NGAs** with the **time period** from which the data is accumulated, we can clearly see that the data is collected in a **uniform period of time**.


1. **Kansai** has the data ranging from **5300 BC** all the way up to **1600 AD**. This is the region in which the most data is gathered according to this graph.




2. **Cambodian Basin** is the region which is ranging from **220 AD** to **1600 AD** so we can say that this region has less data compared to Kansai.




All the other regions have their time periods in between these two regions mostly





*Now, let us try to see the frequency of the time periods and analyse their patterns*

In [None]:
sns.distplot(df['Date.From'],bins=10,kde=False,color='y')
plt.title('Frequency of Time Periods')
plt.xlabel('Time Periods')
plt.ylabel('Frequency')

We can see the **frequency of the time periods** by the above visualization clearly.

The frequency clearly appears to be **diminishing** as the time period goes **more and more into BC**. 

This means that the **data is more accurately collected** in the **recent times** than the BC times which makes sense

In [None]:
fig,ax = plt.subplots(figsize=(10,20),nrows=12,ncols=1)

cols = df.columns[2:14]

import random



for i in range(12):
    r = lambda: random.randint(0,255)
    colo = '#%02X%02X%02X' % (r(),r(),r())
    ax[i].plot(df[cols[i]],color=colo)
    ax[i].set_xlabel(cols[i])
    ax[i].set_title(f'{cols[i]} Plot')
    plt.tight_layout()

*These are the plots of all columns and how they are varying with time*


**Now, let us try to plot how each column is varying over time with respect to their respective regions**

In [None]:
fig,ax = plt.subplots(figsize=(10,20),nrows=10,ncols=1)

nganames = df['NGA'].unique()

import random



for i in range(10):
    r = lambda: random.randint(0,255)
    colora = '#%02X%02X%02X' % (r(),r(),r())
    colorb = '#%02X%02X%02X' % (r(),r(),r())
    a = df.loc[df['NGA']==nganames[i]]
    ax[i].plot(a['Date.From'],a['Moralizing_norms'],color=colora,alpha=0.75, lw=1,marker='o',ls='-.',markersize=7,markerfacecolor='r',markeredgecolor='b')
    ax[i].set_xlabel('Time Periods')
    ax[i].set_ylabel('Moralizing Norms Scores')
    ax[i].set_title(f'Moralizing Norms in {nganames[i]} region')
    
    plt.tight_layout()


### Inferences



1. All these graphs are with respect to **Moralizing Norms in various regions** of the world at **different time periods**



2. We can clearly observe some patterns in these graphs. 



3. Except for norms in **"Crete Region"**, in all the other regions, moralising norms have **either seen a great increase from a particular period of time or has been exisitng ever since long back**.



4. **"Cambodian Basin"**, **"Galilee"**, **"Kachi Plain"** and **"Latium"** regions are the ones in which Moralizing norms existed even during their recored time periods.



5. Whereas in other regions, we can see that the Moralising Norms **have increased within a span of 500 to 1000 years**

Let us now try to plot the same kind of plots for some interesting norms like :-

1. **Omniscient_supernatural_beings**
2. **equating_elites_and_commoners**
3. **equating_rulers_and_commoners**
4. **Impeachment**

In [None]:
fig,ax = plt.subplots(figsize=(10,20),nrows=10,ncols=1)

nganames = df['NGA'].unique()

import random



for i in range(10):
    r = lambda: random.randint(0,255)
    colora = '#%02X%02X%02X' % (r(),r(),r())
    colorb = '#%02X%02X%02X' % (r(),r(),r())
    a = df.loc[df['NGA']==nganames[i]]
    ax[i].plot(a['Date.From'],a['Omniscient_supernatural_beings'],color=colora)
    ax[i].set_xlabel('Time Periods')
    ax[i].set_ylabel('Omniscient Beings Scores')
    ax[i].set_title(f'Belief in Supernatural Beings in "{nganames[i]}" region')
    
    plt.tight_layout()


#### Analysis of the "Belief in Omniscient Beings" across various regions in the world gave some rather interesting results. Let us go over it in detail now.


1. In **Cambodian Basin** region, we can clearly see that the people **didnt believe in the Omniscient Beings** at any point of time so it is zero. It will be interesting to look at these graphs and the "Rulers_not_gods" to see if they are complementary


2. In the **Crete** region, we can see that around from **250 AD**, people slowly started to believe in Omniscient Beings


3. **Galilee** and **Kachi Plain** region **didnt show** any **stable relationship** in this aspect as we can see from various up and downs that peoples belief was stagnant here.


4. **Kansai and Middle Yellow River valley** are the same as **Cambodian Basin** where people didnot believe in Omniscient Beings


5. In **Upper Egypt** region, we can see that the peoples belief **increased slowly** over a wide range of time and from around 100AD, people started fully believing in Omniscient Beings

In [None]:
fig= plt.figure(figsize=(20,40))

nganames = df['NGA'].unique()

import random



n = 1
for i in range(10):
    
    r = lambda: random.randint(0,255)
    colora = '#%02X%02X%02X' % (r(),r(),r())
    colorb = '#%02X%02X%02X' % (r(),r(),r())
    a = df.loc[df['NGA']==nganames[i]]
    plt.subplot(10,2,n)
    plt.plot(a['Date.From'],a['Omniscient_supernatural_beings'],color=colora)
    plt.xlabel('Time Periods')
    plt.ylabel('Omniscient Beings Scores')
    plt.title(f'Belief in Supernatural Beings in "{nganames[i]}" region')
    n+=2
    plt.tight_layout()
    
b = 2

for i in range(10):
    
    r = lambda: random.randint(0,255)
    colora = '#%02X%02X%02X' % (r(),r(),r())
    colorb = '#%02X%02X%02X' % (r(),r(),r())
    a = df.loc[df['NGA']==nganames[i]]
    plt.subplot(10,2,b)
    plt.plot(a['Date.From'],a['Rulers_not_gods'],color=colora)
    plt.xlabel('Time Periods')
    plt.ylabel('Belief in Rulers Scores')
    plt.title(f'Belief in Rulers in "{nganames[i]}" region')
    b+=2
    plt.tight_layout()


These plots depict the comparision between the scores of **"Omniscient Beings"** and **"Rulers"** in different regions

Now, let us look at the comparision of **"Equating_elites_and_commoners"** and **"Equating_rulers_and_commoners"**

In [None]:
fig= plt.figure(figsize=(20,40))

nganames = df['NGA'].unique()

import random



n = 1
for i in range(10):
    
    r = lambda: random.randint(0,255)
    colora = '#%02X%02X%02X' % (r(),r(),r())
    colorb = '#%02X%02X%02X' % (r(),r(),r())
    a = df.loc[df['NGA']==nganames[i]]
    plt.subplot(10,2,n)
    plt.plot(a['Date.From'],a['Equating_elites_and_commoners'],color=colora)
    plt.xlabel('Time Periods')
    plt.ylabel('Equality of elites and commoners Scores')
    plt.title(f'Equality of elites and commoners in "{nganames[i]}" region')
    n+=2
    plt.tight_layout()
    
b = 2

for i in range(10):
    
    r = lambda: random.randint(0,255)
    colora = '#%02X%02X%02X' % (r(),r(),r())
    colorb = '#%02X%02X%02X' % (r(),r(),r())
    a = df.loc[df['NGA']==nganames[i]]
    plt.subplot(10,2,b)
    plt.plot(a['Date.From'],a['Equating_rulers_and_commoners'],color=colora)
    plt.xlabel('Time Periods')
    plt.ylabel('Equating commoners and Rulers Scores')
    plt.title(f'Equality of commoners and Rulers in "{nganames[i]}" region')
    b+=2
    plt.tight_layout()


### Inferences:-


1. In **Cambodian Basin"** region, we can see that the social norms of **equality between elites and commoners** and **rulers and commoners** are literally **same**. This can be interpreted as the social norm grew in existence in these time periods due to a variety of factors not known.




2. **Kansai region** has seen a social norm of equality between rulers and commoners while elites and commoners were treated equally in the early 400 to 1500 CEs.




3. Similar to the **Kansai**, the **Middle Yellow River Valley** region has also never seen a norm of equality between rulers and commoners while the equality between elites and commoners was seen to increase from the late 1000 CE




4. Both the norms of **equating commoners with elites and rulers** has seen a **similar increment trend** in the region of **Susania**.




5. In the **Upper Egypt** region, commoners were treated equally with the elites for a prolonged period of time while the notion of treating rulers and commoners equally came around in the late 1000 CE

*Now, let us try to visualize the remaining columns by using some categorical plots*

In [None]:
df.columns

In [None]:
plt.figure(figsize=(15,8))
sns.boxplot(x="NGA", y="Promotion_of_prosociality",
            data=df.sort_values("NGA"))
plt.title('Prosociality in various regions',fontsize=15, fontweight='bold')
plt.xlabel('Regions',fontsize=12)
plt.ylabel('Prosociality in various regions',fontsize=12)
plt.xticks(rotation=90)
plt.show()

**"Cambodian Basin"**, **"Gelilee"**, **"Kansai"** and **"Latium"** regions **differ** from the other regions with respect to the promotion of **Pro Sociality**


**Prosociality** refers to the traits of people **behaving for the betterment of community as a whole**. Deeds like helping others etc contributes to Prosociality


1. **Cambodian Basin** is the region in which **majority of people** were **prosocial** in all periods of time 


2. **Galilee** region and **Latium** region are also the **same** with respect to **Cambodian Basin** in which majority of people believed in prosocilaity. We can see this from the graph that majoirty of the scores point to 1 while a small minority of points lie as outliers at 0.


3. **Kansai** region here once agains shows a **different trend** compared to all the other regions where the norm of **prosociality is least followed** while **Yellow River Valley** also has most values greather than 0.5 with some exceptions

In [None]:
cols = df['NGA'].unique()
for i in cols:
    print(i,df.loc[df['NGA']==i]['Promotion_of_prosociality'].value_counts())

In [None]:
plt.figure(figsize=(10,8))
sns.violinplot(x="NGA", y="General_applicability_of_law",
            data=df.sort_values("NGA"),palette="muted")
plt.title('Applicability of Law in various regions',fontsize=15, fontweight='bold')
plt.xlabel('Regions',fontsize=12)
plt.ylabel('Prosociality in various regions',fontsize=12)
plt.xticks(rotation=90)
plt.show()

In [None]:
plt.figure(figsize=(15,10))
sns.violinplot(x="NGA", y="Impeachment",
            data=df.sort_values("NGA"),palette="deep")
plt.title('Impeachment in various regions',fontsize=15, fontweight='bold')
plt.xlabel('Regions',fontsize=15)
plt.ylabel('Impeachment in various regions',fontsize=15)
plt.xticks(rotation=90)
plt.show()

In [None]:
nga_list = df['NGA'].unique()
for i in nga_list:
    nga_subdf = df[df['NGA'] == i]
    nga_subdf = nga_subdf.drop(['NGA', 'sum'], axis = 1)
    fig, ax = plt.subplots(figsize=(20,5))
    ax.stackplot(nga_subdf['Date.From'], 
                 nga_subdf['Moralistic_punishment'], 
                 nga_subdf['Moralizing_norms'], 
                 nga_subdf['Promotion_of_prosociality'], 
                 nga_subdf['Omniscient_supernatural_beings'],
                 nga_subdf['Rulers_not_gods'], 
                 nga_subdf['Equating_elites_and_commoners'],
                 nga_subdf['Equating_rulers_and_commoners'],
                 nga_subdf['Formal_legal_code'],
                 nga_subdf['General_applicability_of_law'],
                 nga_subdf['Constraint_on_executive'],
                 nga_subdf['Full-time_bureaucrats'],
                 nga_subdf['Impeachment'], 
                 labels = nga_subdf.columns)
    ax.set_xlim(-4000, 1800)
    ax.set_title(i)
    ax.legend(loc=2)
    ax.axes.get_yaxis().set_visible(False)

In [None]:
plt.figure(figsize=(15,8))
sns.heatmap(df.corr())

## 7.  Evolution of Norms over time

In [None]:
df.corr()

1. Moralistic Punishment is greatly correlated with Omniscient and Supernatural beings and we can probably assume this way that the belief of gods in people impacted the moralistic punishments


2. Equating elites with commoners has a high correlation with equating rulers with commoners as we already saw the similar trends in various graphs


3. Formal_regal_code goes hand in hand with Moralizing_norms, general applicability of law and moralizing norms as we can see from their correlations.


4. In the places where people believed that their rulers are supreme, we can see that the equality of elites and commoners and equality of rulers and commoners is nearly equal to 0.

**<center> <span style="color:blue;font-family:serif; font-size:40px;">The End </span> </center>**