Applied Machine Learning (COMP551) - Project 1

Applied Machine Learning - Project 1

Multilingual Dialog Dataset

In order to provide conversational training data in other languages than English we propose parsing openly available theatre plays in French. For this purpose, we will be curating dialog datasets in French, obtained by crawling through websites that aggregate openly available theatre works in a consistent and parseable format. In addition, we will parse sample interviews, released by authors through free sources on the web as well as language tutorials that feature conversations in French.

Extracted dialogs are in an XML where each 's' mark down is a conversation and each 'utt' is an utterance:

Combined resulting corpus can be found at: https://drive.google.com/open?id=0B1ItK6JlQ6ImRXAzMm1jSU9aOTA

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
report		report
results		results
stats		stats
.gitignore		.gitignore
Interview_OUTPUT.txt		Interview_OUTPUT.txt
README.md		README.md
comedy-links.txt		comedy-links.txt
drama-links.txt		drama-links.txt
history-links.txt		history-links.txt
read-dramas.ipynb		read-dramas.ipynb
read-dramas.py		read-dramas.py
read-history.ipynb		read-history.ipynb
read-history.py		read-history.py
read_comedies.ipynb		read_comedies.ipynb
read_comedies.py		read_comedies.py
scraping_interviews.py		scraping_interviews.py
scraping_spokenFrench.py		scraping_spokenFrench.py
spoken_french.xml		spoken_french.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Applied Machine Learning (COMP551) - Project 1

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Applied Machine Learning (COMP551) - Project 1

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages