In [1]:
import pandas as pd
import markovify
import re

In [2]:

# Load the dataset
df = pd.read_csv('data/ted_main.csv')

# Check the first 5 data in each column
for col in df.columns:
    print(df[col].head(5))

print(len(df))

0    4553
1     265
2     124
3     200
4     593
Name: comments, dtype: int64
0    Sir Ken Robinson makes an entertaining and pro...
1    With the same humor and humanity he exuded in ...
2    New York Times columnist David Pogue takes aim...
3    In an emotionally charged talk, MacArthur-winn...
4    You've never seen data presented like this. Wi...
Name: description, dtype: object
0    1164
1     977
2    1286
3    1116
4    1190
Name: duration, dtype: int64
0    TED2006
1    TED2006
2    TED2006
3    TED2006
4    TED2006
Name: event, dtype: object
0    1140825600
1    1140825600
2    1140739200
3    1140912000
4    1140566400
Name: film_date, dtype: int64
0    60
1    43
2    26
3    35
4    48
Name: languages, dtype: int64
0     Ken Robinson
1          Al Gore
2      David Pogue
3    Majora Carter
4     Hans Rosling
Name: main_speaker, dtype: object
0        Ken Robinson: Do schools kill creativity?
1             Al Gore: Averting the climate crisis
2                    David Pogu

In [3]:
# Get the first 2000 titles and comments
titles = df['title']
descriptions = df['description']


# Override the 'sentence_split()' function

In [4]:
class CustomText(markovify.Text):
    def sentence_split(self, text):
        sentences = re.split(r'\n', text)
        return sentences

In [5]:
# Seperate sentences with '\n'
titles_text = '\n'.join(titles)
descriptions_text = '\n'.join(descriptions)

# Build the model.
title_model = CustomText(titles_text)
description_model = CustomText(descriptions_text)


# Print ten randomly-generated sentences
print("====Titles====")
for i in range(10):
    print(title_model.make_short_sentence(280))
    print()

print()
print()
print("=====Descriptions=====")
for i in range(10):
    print(description_model.make_sentence())
    print()

====Titles====
The Blur Building and other mosquito-borne diseases

What's wrong with what we eat to starve cancer?

Organic design, inspired by the ocean

What I learned from my autistic brothers

Biomimicry's surprising lessons from big cats

Let's not use Mars as a young rebel

Success is a strength, not a zero-sum game

Why mayors should rule the world vote for in your home -- and how to create magic

How not to be happy? Be grateful

The mysterious world of bioluminescence



=====Descriptions=====
Aaron Huey's effort to collect all the livable space on Earth? Paul Gilding suggests we calibrate our outlook on time as a given. 

We've all dreamed of flying -- but biological products, based on happiness. In an exclusive preview of the electric car.

What do these two challenges have in common is that really what the facts were known at the protein level may lead to more complex concepts. Call it Chineasy.

As a democratic revolution led by Chad Jenkins, that gives him the ability to

# Combining the Title part and Description part

In [6]:
contents = ["Title: " + a + " Description: " + b for a, b in zip(titles, descriptions)]

text = '\n'.join(contents)
# Build the model.
text_model = CustomText(text)

# Print ten randomly-generated sentences
for i in range(10):
    print(text_model.make_sentence())
    print()


Title: Unlock the mysteries of the Indian Police Service, she managed one of them up in our inboxes, and standard procedure is to delete on sight. But what if there's a pervasive hidden problem keeping poverty alive. Haugen reveals the deeper role theater can play in childhood makes for a great leader Description: The photo director for National Geographic, David Griffin knows the power to make the world ever more independently ... because, she suggests, are actually helping the disease spread.

Title: The quantified self Description: At TED@Cannes, Gary Wolf gives a magical performance, talks about the magic of math.

Title: The world is running out of unexpected questions and shares a fresh approach to helping these farmers lift themselves out of hospital beds.

Title: Let my dataset change your mind after watching Moschen in motion.

Title: The art of puzzles Description: At TEDGlobal 2009, he demos an idea and plans to begin a company to fight an epidemic of neurological diseases, 

# Seperate

# Football Commentary

In [17]:
df = pd.read_csv('data/football_commentary.csv')
text = " ".join(df['text'])
text_model = markovify.Text(text)

# Generate sentences based on the text data
for _ in range(5):
    sentence = text_model.make_sentence()
    if sentence:
        print(sentence)

Marco Reus with a through ball.
Offside, Paris Saint Germain 1.
Julian Draxler replaces Jose Paolo Guerrero is caught offside.
Assisted by Mats Hummels.


# Movies

### This is a guessing game, where we have 5 movies' transcripts, and we try to guess the movie title from the generated texts. The movies are: 
- The Imitation Game
- Interstellar
- Titanic
- Forrest Gump
- Top Gear

In [51]:
with open("data/movie scripts/movie1.txt") as f:
    text = f.read()

text_model = markovify.Text(text)

for i in range(20):
    sentence = text_model.make_sentence()
    if sentence:
        print(str(i+1) + ". "+ sentence)

1. Y-You need me a lot easier if you weren't alone?
2. Do you know what they do not like you.
3. What makes you think we were searching -the wrong one. -What are you helping me?
4. I-I solved a-a crossword puzzle in the bloody door. -Uh, no.
5. It's really quite useful to be very good at... -Ma'am, -I'll have to promise me that you do not like you.
6. I don't want to be exact about it.
7. It's all the help you if they do to homosexuals?
8. Break the code, at least give us some more time.
9. So I can barely understand? -Uh, 23. -And you don't get to know what to feed to the lovely young ladies who tend to be able to say that it would take for us to win -the war. -Our job was to crack Enigma.
10. Peter... do you know what it's on.
11. I smiled at me a Soviet spy.
12. T... -JOAN: Ready? -Yes. -M... -M... -Y... -Y... -I... -I... -S...
13. Alan, are you saying?
14. But you've just set the record for the likes of you.
15. Do you have the life together that we need, then we'll go back to doin

In [50]:
with open("data/movie scripts/movie2.txt") as f:
    text = f.read()

text_model = markovify.Text(text)

for i in range(20):
    sentence = text_model.make_sentence()
    if sentence:
        print(str(i+1) + ". "+ sentence)

1. - And Miller's is on the other side, huh?
2. Eight months to Saturn.
3. And we're growing more than just a few minutes...
4. Yeah, he's got a chance for the long nap, wait to be with other people is powerful.
6. Conserve fuel, minimize thrusting, but make sure they bring it back.
7. It's not to breathe.
8. I didn't at least three of them.
9. Miss Hanley's here to do?
10. Love is the one.
11. We must think not as individuals but as a resource, like oxygen and food.
12. came off the ground?
13. is why I can't describe it.
14. You know why we call it a whirl?
15. it'll have to talk about Murph.
16. I'm not afraid of time.
17. now you need to be with other people is powerful.
18. who I know that.
19. I'm here for me now.
20. - But they said I could use hibernation to stretch that...


In [49]:
with open("data/movie scripts/movie3.txt") as f:
    text = f.read()

text_model = markovify.Text(text)

for i in range(20):
    sentence = text_model.make_sentence()
    if sentence:
        print(str(i+1) + ". "+ sentence)

1. He seems to be sent back, twice.
2. I'll never let go, and I'm going home.
3. I see it in my memory.
4. It happens Mr. Dawson here...
5. It's going to have to leave.
6. Let's see you do that.
7. ...had nothing to do it fast.
8. I have a picture of me looking like a brandy.
9. It's all right, dearie.
11. Fabri, Tommy, give me your hand.
12. You didn't come to me have disappeared.
13. 56 carats to be sent back, twice.
14. Only for a little silver one, Rose.
15. I can't see you. - What's happening, Jack? - I have it, miss.
16. By the next one.
17. Oh, it is a deep ocean of secrets.
18. What are you going?
19. If only you'd come to me have disappeared.


In [47]:
with open("data/movie scripts/movie4.txt") as f:
    text = f.read()

text_model = markovify.Text(text)

for i in range(5):
    sentence = text_model.make_sentence()
    if sentence:
        print(str(i+1) + ". "+ sentence)

1. And that's all I have a ride?
2. Jenny most of all, I thought you might help me.
3. I was showing him a thing called the Hotel Ebbott.
4. - Run, you stupid or something?
5. I should be in a million sparkles on the space shuttle.


In [46]:
with open("data/movie scripts/movie5.txt") as f:
    text = f.read()

text_model = markovify.Text(text)

for i in range(10):
    sentence = text_model.make_sentence()
    if sentence:
        print(str(i+1) + ". "+ sentence)

1. Look, we did agree that we have these two.
2. One thing's for sure, it is the Ferrari, the GTO, the 599 GTO.
3. Lexus normally do everything they can to disguise that their cars have been talking about the Vauxhall Vectra?
4. And nor do Formula 1 cars have to start with this.
5. Now, the thing that appears to have risen from the fuel tank directly back into the Darlington Football Club car park to the car park to the sound barrier, and runs a fleet of nuclear-powered 33 knot airports.
6. This is a comedy car, that you're being propelled by bits of metal flying around.
7. Because all that remains for us to do is very nasty.
8. Of course, nobody is watching this, which makes it a lot of noise, speed, power and tortured metaphors.
9. I like a complete hooligan.
10. If you want to wake him up.
