# Weeks 5 - 6: Text as Data, Part 3 (Sentiment Analysis and Unit Project)

<img src="https://upload.wikimedia.org/wikipedia/commons/8/8b/Network_visualisation_incorporating_sentiment_analysis_of_the_subreddit_%27skeptic%27_from_Reddit.png" width="750" height="540">


As we learned in the last chunk of work, [Digital Humanities](https://en.wikipedia.org/wiki/Digital_humanities) is a field of study that applies computational tools to the study of the humanities. You explored texts of various kinds using (mostly) word frequencies and then visualized (and shared) your findings. In this last chunk of work in the the unit, *Text as Data*, we will be going beyond word frequencies to explore [sentimental analysis](https://en.wikipedia.org/wiki/Sentiment_analysis) and networks for words ([n-grams](https://en.wikipedia.org/wiki/N-gram)) in order to discern underlying and inferential meanings in texts. You will then create and perform a project that demonstrates (and hopefully extends) your newly minted text analysis skills.


## Work for Weeks 4-5 - Sentiment Analysis and N-Grams



### Chunk 1: Sentiment Analysis in R with TidyText

<img src="https://www.tidytextmining.com/02-sentiment-analysis_files/figure-html/sentimentplot-1.png" width="650" height="340">

A friend and colleague, Tom Liam Lynch, is a former high school English teacher and teacher educator, who has focused on making computationally supported textual analysis available to English teachers. Part of his work has included the development of [Plotting Plots](https://plottingplots.com/), a website where teachers and students can explore texts computationally.

Here is the work for this Chunk:

1. *Reading*. Our friends at Towards Data Science have created [this good introduction](https://towardsdatascience.com/sentiment-analysis-concept-analysis-and-applications-6c94d6f58c17) to Sentiment Analysis, which inludes its most common (and they are very common) uses. Please read this introduction before doing any coding in R.

2. *Playing*. Work through [Chapter 2](https://www.tidytextmining.com/sentiment) in [Text Mining with R](https://www.tidytextmining.com/): *Chapter 2: Sentiment Analysis with Tidy Data.* By work through, I mean read them and follow the examples, and do perform each exercise in RStudio. Please save the code and visualizations you generate so you can share them.

3. *Going Further*. Now that you have some experience using R to analyze texts and visualize those analyses, it times to go beyond the exercises. Go back to some of the texts from last week (perhaps our friend and companion Moby Dick) to see what there is to learn about them through sentiment analysis.


### Chunk 2: N-Grams in R

<img src="https://www.tidytextmining.com/04-word-combinations_files/figure-html/bigramtfidf-1.png" width="650" height="340">

Now that we have used some unplugged tools for data analysis and explored some computational tools (Plotting Plots and Voyant-Tools), it's time to explore some coding tools that support the kinds of textual analysis and visualization in which we have been engaged. This type of work can be done productively in both Python and R, but for this week, we will focus on a set of R tools

Here is the work for this Chunk:

1. *Reading*. Our friends at Towards Data Science have created [this good introduction](https://towardsdatascience.com/understanding-word-n-grams-and-n-gram-probability-in-natural-language-processing-9d9eef0fa058) to N-Grams. Please read this introduction before doing any coding in R.

2. *Playing*. Work through [Chapter 4](https://www.tidytextmining.com/ngrams) in [Text Mining with R](https://www.tidytextmining.com/): *Relationships Between Words: N-grams and Correlations.* By work through, I mean read them and follow the examples, and do perform each exercise in RStudio. Please save the code and visualizations you generate so you can share them.

3. *Going Further*. Now that you have some experience using R to analyze texts and visualize those analyses, it times to go beyond the exercises. Go back to some of the texts from last week (perhaps our friend and companion Moby Dick) to see what there is to learn about them through N-gram/correlatiln analysis.


###  Chunk 3: BONUS: Applying What We Have Learned

This is an optional assignment. 

<img src="https://images.unsplash.com/photo-1596495718166-7ac739ca1bc4?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=2071&q=80" width="650" height="340">

#### Showing What You Know

Now that you have acquired some data analysis and visualization skills (primarily in R using *tidytext*, you can take these skills for a ride by exploring some education related or adjacent text. Here are some possible texts (or sets of texts) you can explore:
1. [Education Week](https://www.edweek.org/). EdWeek is a respected source for policy related to education. 
2. [US Department of Education](https://www.ed.gov/). The US DOE has tons of text related to education, including a section on *Laws.*
3. [New York State Learning Standards](https://www.nysed.gov/next-generation-learning-standards). Learning standards are quasi-legal text documents that have enormous influence on education in the state. I have used these tools to analyze the learning standards pertaining to Computer Science and Digial Fluency. You can find that paper [here](https://d1wqtxts1xzle7.cloudfront.net/87230040/11122ijite04-libre.pdf?1654744970=&response-content-disposition=inline%3B+filename%3DA_Close_Reading_and_Analysis_of_the_New.pdf&Expires=1696862598&Signature=dubrPRpGkvwox67UWAkXHmqkRlb96tgSSKU7sQ7ZDdkHa6yXOfkO-reCvYcIts40kaXgAruVQaUtWc9DR5P2yP6AmkX377lyq94UCLXJ5-qOyrOlQfonPcda0pqp38vNNzktTCkv30DI1j47wNCLge02CIaut0Hq60RPfWYRIU1CGIvwkHWBvoIA3kqf~LvLF8rnIaH3Bg7iGg3klGlhRBEUU6b4xT~B-aA1y9KF~eYijqGcqyG~mqTs77srm4TCTeuYHkr0rEY6tobkyXeFajYV6n33E3XoIqyo8Is0dGRQsW7MJTWWJ8QZqSgD3rgEn6bjoArcQ~dGX0VBaCJTdg__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA).
4. [Dealer's Choice](https://en.wikipedia.org/wiki/Dealer's_choice). Find any text education related or adjacent that you would like to explore.

Once you have done these analyses, please gather your methods and findings into a short paper (including visualizations).



### Posting Reflections and Visualizations

<img src="https://i.imgur.com/2fTqBpU.jpeg" width="650" height="340">

You will share your analysis and reflections using the same document you began for the Week 1 work. Be as generous as you can with your sharing -- show pictures, notes, etc.

Now that you are set up in Slack, please share your reflections on the texts and your Visualization project in the #learining-about-data-and-education channel. 
Feel free to respond to one another and/or ask questions there as well.

**You have two weeks to complete this work. This work is due by Monday, October 23rd, end of day.**

