# EDA

In this project we are looking at the titles and comments from r/SteamDeck and r/linux_gaming. In this notebook we will be exploring the datasets and seeing what information can be gleamed, along with some potential minor cleaning along the way. 

### Imports

In [37]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.feature_extraction.text import CountVectorizer

In [38]:
# Allowing more text from the Title and Comments to be shown.
pd.options.display.max_colwidth = 400

In [39]:
# Grabbing the Title data and storing it in a dataframe
titles = pd.read_csv('../data/linux_gaming_SteamDeck_merged_title_data.csv')

In [40]:
# Grabbing the Comment data and storing it in a dataframe
comments = pd.read_csv('../data/linux_gaming_SteamDeck_merged_comment_data.csv')

In [41]:
titles.head()

Unnamed: 0,title_id,title,subreddit
0,12ysmfy,Improved Wine gaming with exeCute,linux_gaming
1,12yb3qz,There is a lot of native Linux games. What would you recomend?,linux_gaming
2,12yfibp,What is your favorite open source Linux game? Mine is Wideland (Best way to describe is the way Settlers 3 should have been),linux_gaming
3,12yagh3,Wiki is getting attacked by spam bots!,linux_gaming
4,12yg5qp,PSA: the 8bitdo ultimate (bluetooth + 2.4gz) controller doesn't work out of the box on most linux distributions,linux_gaming


In [42]:
comments.head()

Unnamed: 0,comment_id,comments,subreddit
0,jhjsr22,I’m sorry you lost all your friends from gaming too much,SteamDeck
1,jhjz43t,"This feels like an all too poignant commentary on my life. A bunch of games, no friends and no money.",SteamDeck
2,jhjtkl6,"I also have 0,00 Money!",SteamDeck
3,jhjrdnr,I was worried your friend count had gone up. It was only that your games quadrupled. Whew!,SteamDeck
4,jhk1d07,Haha. I honestly thought the 100 referred to cash in the wallet and the joke being about losing 100 friends. Just realised it was about the 100 games. 🤦🏼‍♀️,SteamDeck


Let's look at how much data we have. 

In [44]:
titles.shape

(3113, 3)

In [45]:
comments.shape

(49960, 3)

## Looking at lengths
We have a good amount of titles and comments. Let's add length and word count column to get an idea of the individual sizes of the titles and comments

In [46]:
titles['title_length'] = [len(title) for title in titles['title']]
titles['title_word_count'] = [len(title.split()) for title in titles['title']]

In [47]:
comments['comment_length'] = [len(comment.split()) for comment in comments['comments']]
comments['comment_word_count'] = [len(comment.split()) for comment in comments['comments']]

In [48]:
titles.head()

Unnamed: 0,title_id,title,subreddit,title_length,title_word_count
0,12ysmfy,Improved Wine gaming with exeCute,linux_gaming,33,5
1,12yb3qz,There is a lot of native Linux games. What would you recomend?,linux_gaming,62,12
2,12yfibp,What is your favorite open source Linux game? Mine is Wideland (Best way to describe is the way Settlers 3 should have been),linux_gaming,124,23
3,12yagh3,Wiki is getting attacked by spam bots!,linux_gaming,38,7
4,12yg5qp,PSA: the 8bitdo ultimate (bluetooth + 2.4gz) controller doesn't work out of the box on most linux distributions,linux_gaming,111,18


In [49]:
comments.head()

Unnamed: 0,comment_id,comments,subreddit,comment_length,comment_word_count
0,jhjsr22,I’m sorry you lost all your friends from gaming too much,SteamDeck,11,11
1,jhjz43t,"This feels like an all too poignant commentary on my life. A bunch of games, no friends and no money.",SteamDeck,20,20
2,jhjtkl6,"I also have 0,00 Money!",SteamDeck,5,5
3,jhjrdnr,I was worried your friend count had gone up. It was only that your games quadrupled. Whew!,SteamDeck,17,17
4,jhk1d07,Haha. I honestly thought the 100 referred to cash in the wallet and the joke being about losing 100 friends. Just realised it was about the 100 games. 🤦🏼‍♀️,SteamDeck,29,29


Let's see what the **shortest** and **longest** titles and comments look like.

In [51]:
titles.sort_values(by='title_word_count', ascending=True).head(10)

Unnamed: 0,title_id,title,subreddit,title_length,title_word_count
2246,12t5ocr,Emulation,SteamDeck,9,1
25,12y9sod,VrChat,linux_gaming,6,1
2088,12tndko,Help!,SteamDeck,5,1
2719,12q9ypr,Emudeck/dolphin,SteamDeck,15,1
366,12my173,Skyrim,linux_gaming,6,1
2200,12t71wu,Vortex,SteamDeck,6,1
1889,12umspf,Help,SteamDeck,4,1
311,12og981,RE4R,linux_gaming,4,1
1503,12wchop,Achievements?,SteamDeck,13,1
2792,12q5gdn,Repair,SteamDeck,6,1


In [52]:
titles.sort_values(by='title_word_count', ascending=False).head(10)

Unnamed: 0,title_id,title,subreddit,title_length,title_word_count
1808,12uzwuo,Does An SSD make that much of a difference in games running on deck ?. im gonna be daily driving it for a couple weeks while my PC is in the shop. I assume maybe load times are like 10-15% better but for like 50% of the price extra idk doesnt seem worth it for deck with how low performance it is,SteamDeck,296,62
2533,12rrksu,"Does anyone know how to access the yellow bar and find out what it is? Seems like every time I download a game it gets bigger. If it keeps up I’m gonna run out of space, every game I’ve downloaded is on my SD card. Thought it was the protons but they are listed as games or the blue bar…",SteamDeck,287,60
2867,12vmsuh,Darn good first attempt at applying a skin cover. I found its best when working the last bits to be gentle with pushing the rest down; don't try to FORCE the bend to go where you want or it'll crease. Just heat and work little bits at a time. Kinda bummed there isn't a red and blue back as well.,SteamDeck,296,60
2540,12re9sc,"I wanted to use it to play Guilty Gear strive modded, I did do it manually once but Unverum helps organize it, there was another post of someone who had trouble running unverum but I need more fleshed out steps on what to do, for example when I try opening the application it asks me this",SteamDeck,288,56
509,12gf4pt,"planning to switch to linux but am doubtful about shader cache stuttering in dx12 games. I know dx11 is fine now. Also am planning to play dx12 games not from the steam store, hence no shader pre-caching. Should I stay on Windows if dx12 stutters havent been fixed yet? (i cant stand them)",linux_gaming,289,53
2782,12q7gy5,"I get ""Your Steam Deck is connected to a slow charger, below the recommended rating to consistently charge Steam Deck. Depending on the charger and what you're doing on Steam Deck, your battery may continue to drain"" no matter the charging cable our outlet on my brand new top of the line model..",SteamDeck,296,53
1593,12wdyk4,"Is there any way to free some space from this STOOPID thing, ik that there’s an app that lets you go really deep into your files and delete them or something but idk what to delete and I’m afraid if I do it will do something to the game or the deck.",SteamDeck,249,52
2093,12ud5sy,Noticed a few people posting crazy set ups well. I don't want to bust out everything since it is packed away for a trip but my GF's dad printed me this little set up for my Killswitch and my hub so they can be attached and still use the kickstand.,SteamDeck,247,50
1158,12xqyxr,"I've used tools to move shader caches and compatdata to sd cards, have no games installed on the internal storage, but my 64GB deck still has 40GB of 'other' that I have no control over or ability to identify. What is going on? How can I fix this?",SteamDeck,247,48
1287,12x1bhk,I'm trying to start Resident Evil 4 but my WiFi is currently out. Is there any way you get around this in offline mode? I've played multiple times on offline mode so I don't know why it's doing this now. Happening with all my games as well.,SteamDeck,240,47


Looking at shortest titles we can immediately notice some potential issues for later. 
* Combinging words with the `/` symbol
* Words with LEET speak

In [53]:
comments.sort_values(by='comment_word_count', ascending=True).head(10)

Unnamed: 0,comment_id,comments,subreddit,comment_length,comment_word_count
41632,jf2toxt,[deleted],linux_gaming,1,1
7768,jhfxzch,r/windowsondeck,SteamDeck,1,1
7769,jhfy07p,r/windowsondeck,SteamDeck,1,1
19347,jhkirk6,Indeed!,SteamDeck,1,1
28088,jgnihj9,OK,SteamDeck,1,1
7808,jhayxo4,Same.,SteamDeck,1,1
41031,jffpbqc,GPU?,linux_gaming,1,1
7831,jhckoa2,Same...,SteamDeck,1,1
1571,jhjg5ni,[deleted],SteamDeck,1,1
28120,jgmfqi7,r/whoosh,SteamDeck,1,1


In [54]:
comments.sort_values(by='comment_word_count', ascending=False).head(10)

Unnamed: 0,comment_id,comments,subreddit,comment_length,comment_word_count
40575,jfjqw83,"> On my nvidia system Wayland works perfectly\n\nNo it doesn't.\n\n- X11 desktop apps on wayland don't work and are rendering random flickering frames that keep jumping back and forth between old and new frames, until this heavily contested pull request from NVIDIA is merged by Xwayland, because NVIDIA has absolutely **zero rendering synchronization** between the framebuffers and the screen: h...",linux_gaming,1110,1110
37594,jgcl96t,"LONG-WINDED ALERT! (the tl;dr is: welcome, explore, and hopefully love what you find but I typically don't advise ""going cold turkey"" / ""diving off the cliff"" approaches to switching from X to Linux.\n\n-----------------------------------------------\n\nAs one who started dabbling with Linux on the side in 2018, still keeping macOS my main ""podcasting OS"" and Windows my main ""gaming OS"" (the t...",linux_gaming,965,965
41975,jf3t74q,"Plus I'm assuming your laptop is supported by OpenRazer? That's a big upside if so. \n\nI can't speak much on hybrid graphics laptops, but as someone who has been building their own desktop gaming rigs for 4 years exclusively for Linux, I started with an RX 580 cause I'd heard all the approaching-misinformation community ""wisdom"" regarding NV GPUs. I never bought an Nvidia GPU and went Vega In...",linux_gaming,958,958
38445,jg54ire,"If I'm not mistaken, the easiest distro to do passthrough on is Arch, but unless you have specific games that you know won't work and really care about, I wouldn't go that route since it can be pretty complicated. If you really want that route, you're gonna need to use libvirt and KVM. Check out virtmanager (a front end for it) to get started. Here's [a guide](https://clayfreeman.github.io/gpu...",linux_gaming,906,906
14506,jhbmah0,"I don't recommend an SD Card (and especially not a MicroSD card). An SSD is absolutely fine, though.\n\nI'll explain why in the next few paragraphs, but **a quick, high-level overview of why an SSD is fine is because it's designed to be used as a boot drive**.\n\n---\n\nA small disclaimer: I'm using analogies to explain how flash memory cells work. I tried my best to be as accurate as possible...",SteamDeck,842,842
41245,jfbglaa,"You really used NVIDIA drivers as a selling point on Linux? They are very well known for breaking on Linux. And I have used exclusively NVIDIA on Linux for years. They have broken several times. They aren't as bad as most Linux users make them out to be but they are a hell of a lot better and more advanced on Windows.\n\nFor NVIDIA, you need Windows if you want features such as built-in per-ga...",linux_gaming,807,807
20902,jhmlsh8,"Signal path is this: FSR 1.0 based upscaling is very light on resources, as in ""costing a frame or so"" compared to bilinear upscaling. Target resolution almost doesnt matter, as long as the source resolution stays the same. CPU cost will always be ""arround one frame"" (let it be 2 if it scales to 1440p and that would be an unexpected 100% performance cost increase) when compared to bilinear up...",SteamDeck,798,798
48840,jdcfvas,"I have ran steam games from my NTFS drives from my Linux install for some time.\n\nthe NTFS must be mounted with the proper options, and I recall there may need to be some other tweaks/link done. (but I can't recall doing that part)\n\n\n-------\n\nNotes I made for people trying to use steam under Linux and keeping game files on a NTFS partition. Notes on ext4 filesystem at the end.\n\nAl...",linux_gaming,755,755
27334,jgunlh5,"Sigh..\n\nMy friend, just because you make something **BOLD**, doesn't mean it's right.\n\nSeveral people have tried to tell you via comments and via downvotes, that you are incorrect.\n\nYou should drop it on this one.\n\nAnti Aliasing, and computer graphics in general, are a fascinating topic and I encourage you to keep your enthusiasm for it, but please try to educate yourself before rantin...",SteamDeck,745,745
49890,jhn8hjy,"I run my steam games under Linux, with a steam library on a NTFS partition.\n\n-------\n\nNotes I made for people trying to use steam under Linux and keeping game files on a NTFS partition. Notes on ext4 filesystem at the end.\n\nAlso I Found this Guide - which may be better or have some details I overlook. \n\nhttps://github.com/ValveSoftware/Proton/wiki/Using-a-NTFS-disk-with-Linux-and-W...",linux_gaming,742,742


Now looking at the shortest comment data shows us something else that will be problematic, posts that have been deleted. This one is easy to fix. We can simply drop observations labeled `[deleted]`

In [55]:
comments.drop(comments[comments['comments'] == '[deleted]'].index, inplace=True)

In [56]:
comments.sort_values(by='comment_word_count', ascending=True).head(10)

Unnamed: 0,comment_id,comments,subreddit,comment_length,comment_word_count
9074,jh6mp70,Excellentttttt,SteamDeck,1,1
38638,jg1nkbd,Ah,linux_gaming,1,1
3254,jhidiok,F,SteamDeck,1,1
8741,jheozsa,Thanks!,SteamDeck,1,1
29740,jgq6frq,[https://en.wikipedia.org/wiki/Cathode-ray\_tube](https://en.wikipedia.org/wiki/Cathode-ray_tube),SteamDeck,1,1
33707,jh9p52v,r/FuckEpic,SteamDeck,1,1
38709,jg6dl2w,Yes.,linux_gaming,1,1
12167,jh14oen,WLAN-Disconnects…,SteamDeck,1,1
15859,jgwu1bd,BOTW,SteamDeck,1,1
8671,jhfbtk9,:),SteamDeck,1,1


In [57]:
comments[comments['comments'] == '[deleted]'].head()

Unnamed: 0,comment_id,comments,subreddit,comment_length,comment_word_count


In [58]:
titles.drop(titles[titles['title'] == '[deleted]'].index, inplace=True)

In [59]:
titles[titles['title'] == '[deleted]'].head()

Unnamed: 0,title_id,title,subreddit,title_length,title_word_count


Other pattern of text that is being read as one word of text are urls, along wtih words connected by `-`