# Module 5: Practice Exercises

Welcome to the practice exercises for Module 5. These questions will test your understanding of the core concepts in Natural Language Processing (NLP) and Time Series Analysis.

**Instructions:**
1.  For each problem, read the description and write the necessary code in the cell below it.
2.  Run the cell to see the output and check if your solution is correct.
3.  Solutions are provided at the end of the notebook for you to compare against.

---

### Exercise 1: NLP Pre-processing Pipeline

**Task:** You are given a raw text sentence. Perform the complete NLP pre-processing pipeline on it:
1.  Tokenize the sentence.
2.  Convert all tokens to lowercase.
3.  Remove all stop words and punctuation.
4.  Lemmatize the remaining tokens.

Print the final list of clean, lemmatized tokens.

In [None]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

# Ensure NLTK data is downloaded
nltk.download('punkt', quiet=True)
nltk.download('stopwords', quiet=True)
nltk.download('wordnet', quiet=True)

raw_text = "The quick brown foxes are skillfully jumping over the lazy dogs."

# Your code here

### Exercise 2: Setting a DatetimeIndex in Pandas

**Task:** You are given a Pandas DataFrame where the 'Day' column contains dates as strings. Convert this column into a proper `DatetimeIndex` to make it suitable for time series analysis. Print the `df.info()` to verify that the index has been changed successfully.

In [None]:
import pandas as pd

data = {
    'Day': ['2023-10-01', '2023-10-02', '2023-10-03'],
    'Visitors': [150, 165, 158]
}
df_visitors = pd.DataFrame(data)

# Your code here

---

## Solutions

**Solution 1:**
```python
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

raw_text = "The quick brown foxes are skillfully jumping over the lazy dogs."

# 1. Tokenize and convert to lowercase
tokens = word_tokenize(raw_text.lower())

# 2. Remove stop words and punctuation
stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word.isalpha() and word not in stop_words]

# 3. Lemmatize
lemmatizer = WordNetLemmatizer()
lemmatized_tokens = [lemmatizer.lemmatize(word) for word in filtered_tokens]

print(lemmatized_tokens)
```

**Solution 2:**
```python
import pandas as pd
data = {
    'Day': ['2023-10-01', '2023-10-02', '2023-10-03'],
    'Visitors': [150, 165, 158]
}
df_visitors = pd.DataFrame(data)

# Convert 'Day' column to datetime objects
df_visitors['Day'] = pd.to_datetime(df_visitors['Day'])

# Set the 'Day' column as the index
df_visitors.set_index('Day', inplace=True)

df_visitors.info()
```