Skip to content

PythonCoderUnicorn/HarryPotterBooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Harry Potter Books

Harry Potter books for Text Analysis.

This work was inspired by Bradley Boehmke's R package which claimed to have clean and tidy text data. Upon further inspection the text was in need of further text cleaning, including adding paragraphs to the end of a chapter and removing the many special characters.

This repository has kept each book in csv files as to be most applicable to any user who wants to do text analysis and not deal with a .rda file. Each csv is a book and each book has 2 columns: chapter (in uppercase) and text.

The book order:

  • philosophers_stone: Harry Potter and the Philosophers Stone, published in 1997
  • chamber_of_secrets: Harry Potter and the Chamber of Secrets, published in 1998
  • prisoner_of_azkaban: Harry Potter and the Prisoner of Azkaban, published in 1999
  • goblet_of_fire: Harry Potter and the Goblet of Fire, published in 2000
  • order_of_the_phoenix: Harry Potter and the Order of the Phoenix, published in 2003
  • half_blood_prince: Harry Potter and the Half-Blood Prince, published in 2005
  • deathly_hallows: Harry Potter and the Deathly Hallows, published in 2007

About

Harry Potter books for Text Analysis

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages