# Quickstart Guide

Congratulations on successfully installing Sequenzo! 🎉 You're now ready to explore social sequence analysis with ease.

In this guide, we'll walk you through analyzing country-level CO₂ emissions sequences step by step. If you're curious about how we transformed the original dataset into a format suitable for sequence analysis, you can find the detailed explanation here.

New to Python? No worries! We've designed Sequenzo to be intuitive and beginner-friendly, so you can jump right in regardless of whether you are a newbie or a seasoned Python coder. 

By the end of this tutorial, you'll learn how to:

1. Install Sequenzo
2. Load and explore a dataset
3. Analyze social sequences
4. Visualize the results

Now, let's get started on this exciting journey! 🐍✨


## 1. Install packages and load the data, and get a rough idea of the data

In [8]:
from sequenzo import *
import pandas as pd

# List all the available datasets in Sequenzo
print('Available datasets in Sequenzo: ', sequenzo.list_datasets())

# Load the data that we would like to explore in this tutorial
df = sequenzo.load_dataset('country_CO2_emissions')

# Show the dataframe
df

Available datasets in Sequenzo:  ['country_income', 'country_life_expectancy', 'country_CO2_emissions']


Unnamed: 0,Country,1990,1991,1992,1993,1994,1995,1996,1997,1998,...,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
0,Afghanistan,Very Low,Very Low,Very Low,Very Low,Very Low,Very Low,Very Low,Very Low,Very Low,...,Very Low,Very Low,Very Low,Very Low,Very Low,Very Low,Very Low,Very Low,Very Low,Very Low
1,Albania,Middle,Low,Low,Low,Low,Low,Low,Low,Low,...,Middle,Middle,Middle,Middle,Middle,Middle,Middle,Middle,Middle,Middle
2,Algeria,Middle,Middle,Middle,Middle,Middle,Middle,Middle,Middle,Middle,...,Middle,Middle,Middle,Middle,High,High,High,High,High,High
3,Andorra,Very High,Very High,High,High,High,High,High,High,Very High,...,High,High,High,High,High,High,High,High,High,High
4,Angola,Low,Low,Low,Low,Low,Low,Low,Low,Low,...,Low,Low,Low,Low,Low,Low,Low,Low,Low,Low
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
185,Venezuela,High,High,High,High,High,High,High,High,High,...,High,High,High,High,High,High,High,High,High,High
186,Vietnam,Very Low,Very Low,Very Low,Very Low,Very Low,Low,Low,Low,Low,...,Middle,Middle,Middle,Middle,Middle,Middle,Middle,Middle,Middle,Middle
187,Yemen,Low,Low,Low,Low,Low,Low,Low,Low,Low,...,Low,Low,Low,Low,Low,Low,Very Low,Very Low,Very Low,Very Low
188,Zambia,Very Low,Very Low,Very Low,Very Low,Very Low,Very Low,Very Low,Very Low,Very Low,...,Very Low,Very Low,Very Low,Very Low,Very Low,Very Low,Very Low,Low,Low,Very Low


The classification is based on all years' CO₂ per capita values:

* Very Low (Bottom 20%)
* Low (20-40%)
* Middle (40-60%)
* High (60-80%)
* Very High (Top 20%)

For instance, when looking at the data for Andorra shown in the output:  

* 1990-1991: Started at "Very High" levels, indicating emissions in the top 20% of all countries
* 1992-1997: Dropped to "High" level (60-80th percentile)
* 1998: Brief return to "Very High" level
* 2000s onwards: Stabilized at "High" level (60-80th percentile) and maintained this classification through 2019

In [7]:
# Filter the data for Andorra
andorra_data = df[df['Country'] == 'Andorra']

andorra_data

Unnamed: 0,Country,1990,1991,1992,1993,1994,1995,1996,1997,1998,...,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
3,Andorra,Very High,Very High,High,High,High,High,High,High,Very High,...,High,High,High,High,High,High,High,High,High,High


What if we want to analyze the sequences of all countries? 🤔 Here we need Sequenzo to help us!

## 2. Analyze Social Sequences with Sequenzo

In [14]:
# Create a SequenceData object

# Define the time-span variable
time = list(df.columns)[1:]

states = ['Very Low', 'Low', 'Middle', 'High', 'Very High']

sequence_data = SequenceData(df, time=time, states=states)

sequence_data


✅ SequenceData initialized successfully! Here's a summary:
🔍 Number of sequences: 190
📏 Min/Max sequence length: 18 / 30
🔤 Alphabet: ['High', 'Low', 'Middle', 'Very High', 'Very Low']


SequenceData(190 sequences, Alphabet: ['High', 'Low', 'Middle', 'Very High', 'Very Low'])

Now we have successfully converted our original dataframe into a `SequenceData` object, which is the core object in Sequenzo for analyzing social sequences.

With our naked eyes, we might be only able to see the sequences of one country at a time. However, with the help of Sequenzo, we can analyze all the sequences of countries in one go. The most important tool is **visualization**, which helps us to understand the patterns and trends in the data.

Among the various visualization methods, the most commonly used one is the **index plot**. Let's see how it works.

In [17]:
# Plot the index plot

plot_index(sequence_data)

NameError: name 'plot_index' is not defined

But what if we would like to know more? state distribution plot. 

In [None]:
# Plot the state distribution