<a href="https://colab.research.google.com/github/cretic/Poetry-and-AI-Fall-2024-/blob/main/DH_Lab_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

One of the challenges in the final project is finding a way to output, store, and ultimately display your data in a clean, easy to read format. DH Lab 5 will give you an opportunity to think about how you want to tackle that with your project.

First, here's a demo on a few different formatting options using sonnet 129. In each, let's say we want to display the line, track its enjambments, and record a few other metrics about the line.

In [None]:
sonnet_129 = """The expense of spirit in a waste of shame
Is lust in action; and till action, lust
Is perjured, murderous, bloody, full of blame,
Savage, extreme, rude, cruel, not to trust,
Enjoyed no sooner but despisèd straight,
Past reason hunted; and, no sooner had
Past reason hated as a swallowed bait
On purpose laid to make the taker mad;
Mad in pursuit and in possession so,
Had, having, and in quest to have, extreme;
A bliss in proof and proved, a very woe;
Before, a joy proposed; behind, a dream.
    All this the world well knows; yet none knows well
    To shun the heaven that leads men to this hell."""

# note the tripe quotes, these allow line breaks to be preserved in the sonnet

In [None]:
print(sonnet_129)

The expense of spirit in a waste of shame
Is lust in action; and till action, lust
Is perjured, murderous, bloody, full of blame,
Savage, extreme, rude, cruel, not to trust,
Enjoyed no sooner but despisèd straight,
Past reason hunted; and, no sooner had
Past reason hated as a swallowed bait
On purpose laid to make the taker mad;
Mad in pursuit and in possession so,
Had, having, and in quest to have, extreme;
A bliss in proof and proved, a very woe;
Before, a joy proposed; behind, a dream.
    All this the world well knows; yet none knows well
    To shun the heaven that leads men to this hell.


In [None]:
import spacy

# Load SpaCy English model
nlp = spacy.load("en_core_web_sm")

# Split text into lines
lines = sonnet_129.split("\n")

# Initialize results
results = []

# Process each line
for i in range(len(lines) - 1):
    line = lines[i].strip()
    next_line = lines[i + 1].strip()

    # Parse lines with SpaCy
    doc = nlp(line)
    next_doc = nlp(next_line)

    # Check for punctuation at the end of the line
    enjambed = "no" if line[-1] in ".;!?," else "yes"

    # Check for substantial enjambment (based on syntactic dependencies)
    last_token = doc[-1]
    substantial = "yes" if last_token.dep_ in ["cc", "prep", "conj", "det"] else "no"

    # Append results
    results.append({"Line": line, "Enjambed": enjambed, "Substantial Enjambment": substantial})

# Print results
for result in results:
    print(result)


{'Line': 'The expense of spirit in a waste of shame', 'Enjambed': 'yes', 'Substantial Enjambment': 'no'}
{'Line': 'Is lust in action; and till action, lust', 'Enjambed': 'yes', 'Substantial Enjambment': 'yes'}
{'Line': 'Is perjured, murderous, bloody, full of blame,', 'Enjambed': 'no', 'Substantial Enjambment': 'no'}
{'Line': 'Savage, extreme, rude, cruel, not to trust,', 'Enjambed': 'no', 'Substantial Enjambment': 'no'}
{'Line': 'Enjoyed no sooner but despisèd straight,', 'Enjambed': 'no', 'Substantial Enjambment': 'no'}
{'Line': 'Past reason hunted; and, no sooner had', 'Enjambed': 'yes', 'Substantial Enjambment': 'yes'}
{'Line': 'Past reason hated as a swallowed bait', 'Enjambed': 'yes', 'Substantial Enjambment': 'no'}
{'Line': 'On purpose laid to make the taker mad;', 'Enjambed': 'no', 'Substantial Enjambment': 'no'}
{'Line': 'Mad in pursuit and in possession so,', 'Enjambed': 'no', 'Substantial Enjambment': 'no'}
{'Line': 'Had, having, and in quest to have, extreme;', 'Enjambed': 

In [None]:
## run this cell to install prosodic, then restart runtime as requested

!apt-get install espeak libespeak1 libespeak-dev
!pip install -U prosodic

import prosodic
prosodic.logger.setLevel('ERROR')

# these are the meter parameters.
# if you want to play around with different rules and values, do that and make sure to change the inputs and rerun

constraints={
    'w_peak':3.0,
    'w_stress':1.0,
    's_unstress':1.0,
    'unres_across':1.0,
    'unres_within':1.0,
    'pentameter':20.0,
}
meter = prosodic.Meter(
    constraints=constraints,
    resolve_optionality=True,
    max_s=1,
    max_w=2,
)

meter.to_dict()

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  espeak-data libportaudio2 libsonic0
The following NEW packages will be installed:
  espeak espeak-data libespeak-dev libespeak1 libportaudio2 libsonic0
0 upgraded, 6 newly installed, 0 to remove and 49 not upgraded.
Need to get 1,575 kB of archives.
After this operation, 3,802 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libportaudio2 amd64 19.6.0-1.1 [65.3 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/main amd64 libsonic0 amd64 0.2.0-11build1 [10.3 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/universe amd64 espeak-data amd64 1.48.15+dfsg-3 [1,085 kB]
Get:4 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libespeak1 amd64 1.48.15+dfsg-3 [156 kB]
Get:5 http://archive.ubuntu.com/ubuntu jammy/universe amd64 espeak amd64 1.48.15+dfsg-3 [64.2 kB]
Get:6 http://archive.ubuntu

KeyboardInterrupt: 

In [None]:
linesProsodic = prosodic.Text(sonnet_129)

parse = linesProsodic.parse()


#parse = line.parse()

df = parse.stats()

df
# we are storing these results in a data frame so we can do more with them

[32m[11.40s] Parsing lineparts[0m:  97%|█████████▋| 32/33 [00:02<00:00, 14.07it/s]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0,parse_score,parse_num_viols,parse_ambig,parse_is_bounded,parse_num_sylls,parse_num_words,*w_peak,*w_stress,*s_unstress,*unres_across,...,*total_sylls,*total,*w_peak_norm,*w_stress_norm,*s_unstress_norm,*unres_across_norm,*unres_within_norm,*foot_size_norm,*total_sylls_norm,*total_norm
stanza_num,line_num,line_txt,linepart_num,parse_rank,parse_txt,parse_meter,parse_stress,sent_num,sentpart_num,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1
1,1,Th' expense of spirit in a waste of shame,1,1,th'.ex PENSE of SPI rit IN a WASTE of SHAME,--+-+-+-+-+,--+-+-+-+-+,1,1,1.0,1,2,0,11,9,0.0,0.0,0.0,1.0,...,1,1,0.0,0.000000,0.000000,0.090909,0.0,0.0,0.090909,0.090909
1,2,"Is lust in action; and till action, lust",2,1,is LUST in AC tion and.till AC tion,-+-+---+-,-+-+---+-,1,1,0.0,0,1,0,9,7,0.0,0.0,0.0,0.0,...,0,0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000
1,3,"Is perjured, murd'rous, bloody, full of blame,",5,1,is PER jured MURD 'rous BLOO dy FULL of BLAME,-+-+-+-+-+,-+-+-+-+-+,1,3,0.0,0,1,0,10,7,0.0,0.0,0.0,0.0,...,0,0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000
1,4,"Savage, extreme, rude, cruel, not to trust,",9,1,SA vage ex TREME CR uel NOT to TRUST,+--++-+-+,+--++-+-+,1,7,0.0,0,4,0,9,6,0.0,0.0,0.0,0.0,...,0,0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000
1,5,"Enjoyed no sooner but despisèd straight,",14,1,en JOYED no SOO ner BUT des PISÈD straight,-+-+-+-+-,-+++---++,1,12,3.0,3,2,0,9,6,0.0,2.0,1.0,0.0,...,3,3,0.0,0.222222,0.111111,0.000000,0.0,0.0,0.333333,0.333333
1,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1,10,"Had, having, and in quest to have, extreme;",21,1,had HA ving and.in QUEST to HAVE ex TREME,-+---+-+-+,-+---+-+-+,1,17,0.0,0,4,0,10,8,0.0,0.0,0.0,0.0,...,0,0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000
1,11,"A bliss in proof and proved, a very woe;",25,1,a BLISS in PROOF and PROVED a VE ry WOE,-+-+-+-+-+,-+-+-+-+-+,1,21,0.0,0,1,0,10,9,0.0,0.0,0.0,0.0,...,0,0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000
1,12,"Before, a joy proposed; behind, a dream.",27,1,be FORE a JOY pro POSED be HIND a DREAM,-+-+-+-+-+,-+-+-+-+-+,1,23,0.0,0,2,0,10,7,0.0,0.0,0.0,0.0,...,0,0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000
1,13,All this the world well knows; yet none knows well,31,1,ALL this.the WORLD well KNOWS yet NONE knows WELL,+--+-+-+-+,+--+++-+++,2,27,2.0,2,1,0,10,10,0.0,2.0,0.0,0.0,...,2,2,0.0,0.200000,0.000000,0.000000,0.0,0.0,0.200000,0.200000


This is a lot of information. Let's clean it up a bit. We can extract just the parse score and add our enjambment information.

In [None]:
import pandas as pd
# Extract the parse score column
df_parse_scores = df[['parse_score']]

# Prepare data for the new DataFrame
data = []

for i in range(len(lines) - 1):
    line = lines[i].strip()

    # Parse lines with SpaCy
    doc = nlp(line)

    # Check for punctuation at the end of the line
    enjambed = "no" if line[-1] in ".;!?," else "yes"

    # Check for substantial enjambment (based on syntactic dependencies)
    last_token = doc[-1]
    hard_enjambment = "yes" if last_token.dep_ in ["cc", "prep", "conj", "det"] else "no"

    # Add the results to the data list
    data.append({
        "The line": line,
        "Parse score": df_parse_scores.iloc[i]["parse_score"],
        "Enjambed?": enjambed,
        "Hard enjambment?": hard_enjambment
    })

# Create a new DataFrame
df_results = pd.DataFrame(data)

# Display the DataFrame
print(df_results)


NameError: name 'df' is not defined

Want to make your table more dynamic? Try Plotly, shown below!

In [None]:
import plotly.graph_objects as go

# Extract the parse score column
df_parse_scores = df[['parse_score']]

# Prepare data for the new DataFrame
data = []

for i in range(len(lines) - 1):
    line = lines[i].strip()

    # Parse lines with SpaCy
    doc = nlp(line)

    # Check for punctuation at the end of the line
    enjambed = "no" if line[-1] in ".;!?," else "yes"

    # Check for substantial enjambment (based on syntactic dependencies)
    last_token = doc[-1]
    hard_enjambment = "yes" if last_token.dep_ in ["cc", "prep", "conj", "det"] else "no"

    # Add the results to the data list
    data.append({
        "The line": line,
        "Parse score": df_parse_scores.iloc[i]["parse_score"],
        "Enjambed?": enjambed,
        "Hard enjambment?": hard_enjambment
    })

# Create a new DataFrame
df_results = pd.DataFrame(data)

# Create an interactive table using Plotly
fig = go.Figure(data=[
    go.Table(
        header=dict(
            values=["<b>The Line</b>", "<b>Parse Score</b>", "<b>Enjambed?</b>", "<b>Hard Enjambment?</b>"],
            fill_color="lightblue",
            align="left"
        ),
        cells=dict(
            values=[
                df_results["The line"],
                df_results["Parse score"],
                df_results["Enjambed?"],
                df_results["Hard enjambment?"]
            ],
            fill_color="white",
            align="left"
        )
    )
])

# Show the interactive table
fig.show()


NameError: name 'df' is not defined

These are just a few kinds of tables and calculations you can show. When you organize your data into a data frame, there's a lot more functions you can run.

Your turn! Think about the texts and data objects you need in your final project. In the example above, the top level object was the sonnet, and each line was a row. Columns tracked enjambment and metrical complexity. What are your rows anbd columns? You will need to display them. Do your best to create a dataframe (or another data structure if more useful!) and display it. If you struggle to get code running, flag that for future work.

In [None]:
## your code here

Final Reflection: What is the primary challenge you are facing when it comes to assembling and/or displaying data? Do you know what your data looks like but need help formatting or analyzing it? Or are you unsure what the data should look like? sometimes there is no "right" or even "best" structure, but it is important to notice when that is the case.

Add your reflection in the text box below.

YOUR REFLECTION