Skip to content

Latest commit

 

History

History
80 lines (62 loc) · 7.47 KB

guestbook_katie.md

File metadata and controls

80 lines (62 loc) · 7.47 KB

Gendered Interaction Online

By Katie Thomas

Click here to see my project. Enjoy!

Guest entries:

John's Entries:

  • What was done well: Your data was very thoroughly explored and modified in a logical way! Your files and notebooks are super easy to navigate. That Fitocracy data (a program that I've never heard of) stuck out to me as super helpful for your project.
  • What can be improved: Your data appears to have a few skews to it that might bias your conclusions for your first goal in a few different ways. I was thinking possibly taking a random (yet proportional) sample of the data, or doing something along those lines. For example, the FB data appears consist of a many posts by a few (relatively) people. I don't know how exactly to approach this -- or if it's even a problem in the first place -- but I figured I'd point it out. For your ML goal, though, this shouldn't be a problem.
  • One thing I learned: The use of the .nunique() function was very cool and I'm definitely going to use that in the future for my own project. (also I was a little surprised by how male-dominated Ted Talk responses were at the time this data was collected). Keep up the awesome work!!! I really look forward to seeing where this project travels.

Response

Thank you for the feedback! Yes there's definitely a big skew and I will definitely think about your suggestions!

John's Second Entries:

  • What waas done well: As expected, everything looks great! No corner was left unturned. I really enjoyed the visualizations of your data as well -- they really aided in my understanding of what you were working with. Oh, and nice choice of license! I think the MIT license fits your project perfectly.
  • What can be improved: I could use more explanation of your data, perhaps indicating trends or possible conclusions. You have all the numbers and statistics ready, but it would be interesting to see what conclusions might be able to be drawn from this.
  • One thing I learned: Using > and < to sort your DFs (see Output 32). That makes a lot of sense when working with numerical data, and I'm definitely interested in seeing if I can apply this method to my project.

Response

Thanks for your input! I agree I have a lot of output and numbers, and I'm working on drawing some conclusions!

Eva's entries:

  • What was done well: The project is really well organized - it was easy to follow along with your jupyter notebook files after reading project plan and progress report. Comments are helpful and thoroughly explain the process. Data visualizations are helpful and easily understandable. I am really impressed with the amount of data you're working with, and the topic of your project is really interesting.
  • Improvements and suggestions: Since there's many different sources of data, I think it would be helpful for the final report to include an intro paragraph when you switch datasets. The only time I was confused by your jupyter notebooks was in main_analysis.ipnyb - there's not a ton of comments, so sometimes I got lost when you switched between datasets.
  • What I learned: Disappointed but not really surprised that Reddit is a male-dominated site. For the Fitocracy data, it was interesting that more women reply to men's posts and more men reply to women's posts.

Response

Thanks for your post! Yes, I have a lot of data and will definitely work on making some smoother transitions between the different types!

Cassie's entries:

  • What I Liked: Your goals for this project and the steps you're taking to meet these goals are mapped out really well and are easy to follow. Also, the conclusions.md file is a great place to get a quick summary of your findings without having to navigate through jupyter notebooks.
  • What Can Be Improved: The motivations behind some of the things you do in your data motifications could be made a little clearer. For instance, why do you decide to replace null posts with an empty string instead of getting rid of them? Are any of these included in your 50,000 post sample?
  • What I Learned: Ways to incorporate k-bands into a dataframe and text analysis. I'm also analyzing gender dynamics for my project, so a method like this will be really useful!

Response

Thanks for the post! That's a good question - I'll have to reexamine the null posts are see about that.

Elena's entries:

2019.04.02

  • What I Liked: Your exploratory files are pretty well organized! I really like how you have a separate conclusions markdown.
  • What Can Be Improved: The link to your main analysis in the second progress report 404s. It looks like you changed the location of the file but didn't update the link. You may also want to link to your analysis notebook at the top or if you have any graphs that you find particularly interesting, save them in the notebook and see if you can upload or embed them? Also -- are you going to run any stats on your results?
  • What I Learned: The analysis of gender differences in responses various genders across different online platforms is interesting! Like, on Reddit, everyone has longer responses to female posters than male ones, but on Fitocracy, both groups have longer responses to same-group posters than the opposite.

Response

Thanks for the post! Also thanks for catching that link - I had switched its location and forgot to update the progress report. Definitely going to work more on the graphs and making them more accessible.

Tingwei's entries:

2019.04.02

  • What I Liked: Your analysis is really great, and the data set are sufficient. There are a lot of different resources that you have utilized. It seems that the results will be quite substantial.
  • What Can Be Improved: The analysis looks nice and neat. But I am looking forward to see some conclusion from these statistics. That may be helpful for me to understand your project.
  • What I Learned: The text analysis is great and clear. I may employ some methods to help me quickly get the result with charts, ex: post_length with gender.

Response

Thanks for the post! Yes, I definitely need to continue to develop and put together my conclusions.

Matt's Entry:

  • What I Liked: Using machine leaarning, and one of your sources being from reddit, what an odd coincidence. Also, decent use of markdown to explain thought process
  • What Can Be Improved: Have you considered using additional features in your machine learning (e.x. sentence length) alongside the text? Also, I highly recommend pd.crosstab for confusion matricies, it keeps the row/column labels.
  • What I Learned: Getting prediction probabilities from a classifier/model. I'll have to remember that if I assemble a notebok for presentational purposes.

Patrick's Entry:

  • What I liked: I like the organization of your project, and the fact that you separate your machine learning and main analysis ipynb, makes it more logical. Overall, you show your dataframes and use visualizations alot which makes it more clear what you're doing.
  • What can be improved: I haven't taken statistics but some of your data seem like they could call for statistical analyses. For example some of the hedge frequencies for certain corpora seem fairly close (e.g. 0.05 vs 0.039). Maybe these aren't actually statistically different and don't contradict your hypothesis?
  • What I learned: Gender can manifest itself in online written text in weird ways (like the fact that women tend to write longer posts for example).