About Dataset
Data is sourced from Comparative Constitutions Project (CCP). This dataset is useful for exploratory data analysis and NLP practices.

Content
Scope — This is drawn from Elkins, Ginsburg and Melton, The Endurance of National Constitutions (Cambridge University Press, 2009). It measures the percentage of 701 major topics from the CCP survey that are included in any given constitution.

Length (in Words) — This is simply a report of the total number of words in the Constitution as measured by Microsoft Word.

Executive Power— This is an additive index drawn from a working paper, Constitutional Constraints on Executive Lawmaking. The index ranges from 0-7 and captures the presence or absence of seven important aspects of executive lawmaking:
(1) the power to initiate legislation;
(2) the power to issue decrees;
(3) the power to initiate constitutional amendments;
(4) the power to declare states of emergency;
(5) veto power;
(6) the power to challenge the constitutionality of legislation; and
(7) the power to dissolve the legislature.

The index score indicates the total number of these powers given to any national executive (president, prime minister, or assigned to the government) as a whole.

Legislative Power— This captures the formal degree of power assigned to the legislature by the Constitution. The indicator is drawn from Elkins, Ginsburg and Melton, The Endurance of National Constitutions (Cambridge University Press, 2009), in which we created a set of binary CCP variables to match the 32-item survey developed by M. Steven Fish and Mathew Kroenig in The Handbook of National Legislatures: A Global Survey (Cambridge University Press, 2009). The index score is simply the mean of the 32 binary elements, with higher numbers indicating more legislative power and lower numbers indicating less legislative power.

Judicial Independence — This index is drawn from a paper by Ginsburg and Melton, Does De Jure Judicial Independence Really Matter? A Reevaluation of Explanations for Judicial Independence. It is an additive index ranging from 0-6 that captures the constitutional presence or absence of six features thought to enhance judicial independence.
The six features are:
(1) whether the constitution contains an explicit statement of judicial independence;
(2) whether the constitution provides that judges have lifetime appointments;
(3) whether appointments to the highest court involve either a judicial council or two (or more) actors;
(4) whether removal is prohibited or limited so that it requires the proposal of a supermajority vote in the legislature, or if only the public or judicial council can propose removal and another political actor is required to approve such a proposal;
(5) whether removal explicitly limited to crimes and other issues of misconduct, treason, or violations of the constitution; and
(6) whether judicial salaries are protected from reduction.

Number of Rights — In our ongoing book project on human rights, we analyze a set of 1172 different rights found in national constitutions. The rights index indicates the number of these rights found in any particular constitution.

Preamble - This is something I have extracted from the platform itself. It has the textual content of the preamble of every nation's Constitution.

In [4]:
#import Libraries
import nltk
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from nltk.corpus import stopwords
nltk.download('stopwords')

from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [5]:
#loading the dataset
df=pd.read_csv('Consti.csv')
df.head()

Unnamed: 0,Country,Year Enacted,Scope,Length (in Words),Executive Power,Legislative Power,Judicial Independence,Number of Rights,Preamble
0,Afghanistan,2004,0.67,10227,6,0.38,2,37,"In the name of Allah, the Most Beneficent, the..."
1,Albania,1998,0.61,13826,5,0.43,5,77,"We, the people of Albania, proud and aware of ..."
2,Algeria,1996,0.61,10038,7,0.29,1,36,The Algerian people are a free people; and the...
3,Andorra,1993,0.51,8740,6,0.19,3,51,"The Andorran People, with full liberty and ind..."
4,Angola,2010,0.8,27181,7,0.19,2,80,"We, the people of Angola, through its lawful r..."


In [28]:
# Convert text to lowercase
df['Processed_Preamble'] = df['Preamble'].str.lower()
df.head()

Unnamed: 0,Country,Year Enacted,Scope,Length (in Words),Executive Power,Legislative Power,Judicial Independence,Number of Rights,Preamble,Processed_Preamble
0,Afghanistan,2004,0.67,10227,6,0.38,2,37,"In the name of Allah, the Most Beneficent, the...","in the name of allah, the most beneficent, the..."
1,Albania,1998,0.61,13826,5,0.43,5,77,"We, the people of Albania, proud and aware of ...","we, the people of albania, proud and aware of ..."
2,Algeria,1996,0.61,10038,7,0.29,1,36,The Algerian people are a free people; and the...,the algerian people are a free people; and the...
3,Andorra,1993,0.51,8740,6,0.19,3,51,"The Andorran People, with full liberty and ind...","the andorran people, with full liberty and ind..."
4,Angola,2010,0.8,27181,7,0.19,2,80,"We, the people of Angola, through its lawful r...","we, the people of angola, through its lawful r..."


In [30]:
# Remove punctuation
df['Processed_Preamble'] = df['Processed_Preamble'].apply(lambda x: re.sub(r'[^\w\s]', '', str(x)))
df.head()

Unnamed: 0,Country,Year Enacted,Scope,Length (in Words),Executive Power,Legislative Power,Judicial Independence,Number of Rights,Preamble,Processed_Preamble
0,Afghanistan,2004,0.67,10227,6,0.38,2,37,"In the name of Allah, the Most Beneficent, the...",in the name of allah the most beneficent the m...
1,Albania,1998,0.61,13826,5,0.43,5,77,"We, the people of Albania, proud and aware of ...",we the people of albania proud and aware of ou...
2,Algeria,1996,0.61,10038,7,0.29,1,36,The Algerian people are a free people; and the...,the algerian people are a free people and they...
3,Andorra,1993,0.51,8740,6,0.19,3,51,"The Andorran People, with full liberty and ind...",the andorran people with full liberty and inde...
4,Angola,2010,0.8,27181,7,0.19,2,80,"We, the people of Angola, through its lawful r...",we the people of angola through its lawful rep...


In [31]:
# Remove numeric digits
df['Processed_Preamble'] = df['Processed_Preamble'].apply(lambda x: re.sub(r'\d+', '', str(x)))


In [33]:
# Download stopwords
nltk.download('stopwords')

# Remove stopwords
stop_words = set(stopwords.words('english'))
df['Processed_Preamble'] = df['Processed_Preamble'].apply(
    lambda x: ' '.join([word for word in str(x).split() if word not in stop_words])
)



[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [34]:
# Tokenize text
df['Tokens'] = df['Processed_Preamble'].apply(lambda x: str(x).split())


In [35]:
# Stemming
from nltk.stem import PorterStemmer
ps = PorterStemmer()
df['Stemmed_Tokens'] = df['Tokens'].apply(lambda tokens: [ps.stem(word) for word in tokens])

# Lemmatization
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
df['Lemmatized_Tokens'] = df['Tokens'].apply(lambda tokens: [lemmatizer.lemmatize(word) for word in tokens])


In [36]:
# Rejoin tokens
df['Final_Processed_Text'] = df['Tokens'].apply(lambda tokens: ' '.join(tokens))


In [37]:
df.head()

Unnamed: 0,Country,Year Enacted,Scope,Length (in Words),Executive Power,Legislative Power,Judicial Independence,Number of Rights,Preamble,Processed_Preamble,Tokens,Stemmed_Tokens,Lemmatized_Tokens,Final_Processed_Text
0,Afghanistan,2004,0.67,10227,6,0.38,2,37,"In the name of Allah, the Most Beneficent, the...",name allah beneficent merciful praise allah ch...,"[name, allah, beneficent, merciful, praise, al...","[name, allah, benefic, merci, prais, allah, ch...","[name, allah, beneficent, merciful, praise, al...",name allah beneficent merciful praise allah ch...
1,Albania,1998,0.61,13826,5,0.43,5,77,"We, the people of Albania, proud and aware of ...",people albania proud aware history responsibil...,"[people, albania, proud, aware, history, respo...","[peopl, albania, proud, awar, histori, respons...","[people, albania, proud, aware, history, respo...",people albania proud aware history responsibil...
2,Algeria,1996,0.61,10038,7,0.29,1,36,The Algerian people are a free people; and the...,algerian people free people resolved remain st...,"[algerian, people, free, people, resolved, rem...","[algerian, peopl, free, peopl, resolv, remain,...","[algerian, people, free, people, resolved, rem...",algerian people free people resolved remain st...
3,Andorra,1993,0.51,8740,6,0.19,3,51,"The Andorran People, with full liberty and ind...",andorran people full liberty independence exer...,"[andorran, people, full, liberty, independence...","[andorran, peopl, full, liberti, independ, exe...","[andorran, people, full, liberty, independence...",andorran people full liberty independence exer...
4,Angola,2010,0.8,27181,7,0.19,2,80,"We, the people of Angola, through its lawful r...",people angola lawful representatives legislato...,"[people, angola, lawful, representatives, legi...","[peopl, angola, law, repres, legisl, nation, f...","[people, angola, lawful, representative, legis...",people angola lawful representatives legislato...
