# Sentiment analysis of open-source software communities

This Jupyter notebook includes the data preparation and analysis
for our project exploring open-source software communities.

**Code last updated**: 27 October 2018

***

## Table of contents

***

## Preliminaries

In [8]:
import os
import pandas as pd

In [22]:
os.chdir('../../data/mayavi')

### Read in `comments.tsv`

**Original file contents**
* `author_association`: the comment author's role in the project
    * one of the following: `NONE`, `CONTRIBUTOR`, `MEMBER`, `COLLABORATOR`, `OWNER`
* `body`: content of comment
* `created_at`: time of comment creation
* `id`: unique identifier of entry
* `node_id`: unique identifier of entry for graphQL
* `updated_at`: time of comment update
* `ticket_id`: sequential identifier of 
* `author_name`: commenter's GitHub username
* `author_id`: unique author identifier

**Cleanup**
* Remove `node_id`

In [23]:
comments_df = pd.read_csv('comments.tsv',
                          sep='\t', index_col=0).sort_index()

In [21]:
comments_df['author_association'].unique()

array(['CONTRIBUTOR', 'MEMBER', 'COLLABORATOR', 'NONE', 'OWNER'],
      dtype=object)

In [24]:
comments_df.head(10)

Unnamed: 0,author_association,body,created_at,id,node_id,updated_at,ticket_id,author_name,author_id
0,COLLABORATOR,Very nice. Thanks a lot. Could you integrate y...,2011-04-25 15:45:36,1053358,MDEyOklzc3VlQ29tbWVudDEwNTMzNTg=,2011-04-25 15:45:36,5,GaelVaroquaux,208217
1,CONTRIBUTOR,Yes of course.,2011-04-25 15:46:44,1053363,MDEyOklzc3VlQ29tbWVudDEwNTMzNjM=,2011-04-25 15:46:44,5,Snegovikufa,413925
2,NONE,It would also be nice to report this bug upstr...,2011-04-25 15:53:18,1053385,MDEyOklzc3VlQ29tbWVudDEwNTMzODU=,2011-04-25 15:53:18,5,epatters,316610
3,CONTRIBUTOR,I'm not sure: is this merge request correct no...,2011-04-25 16:07:27,1053437,MDEyOklzc3VlQ29tbWVudDEwNTM0Mzc=,2011-04-25 16:07:27,5,Snegovikufa,413925
4,COLLABORATOR,@epatters: +1 @Snegovikufa: is QT_API a stand...,2011-04-25 16:07:59,1053442,MDEyOklzc3VlQ29tbWVudDEwNTM0NDI=,2011-04-25 16:07:59,5,GaelVaroquaux,208217
5,COLLABORATOR,"The merge looks good, I just had this question...",2011-04-25 16:08:52,1053452,MDEyOklzc3VlQ29tbWVudDEwNTM0NTI=,2011-04-25 16:08:52,5,GaelVaroquaux,208217
6,CONTRIBUTOR,"When I'm using Mayavi, variable QT_API is set ...",2011-04-25 16:11:18,1053472,MDEyOklzc3VlQ29tbWVudDEwNTM0NzI=,2011-04-25 16:11:18,5,Snegovikufa,413925
7,COLLABORATOR,"OK. Great. I don't know where it comes from, b...",2011-04-25 16:13:16,1053487,MDEyOklzc3VlQ29tbWVudDEwNTM0ODc=,2011-04-25 16:13:16,5,GaelVaroquaux,208217
8,COLLABORATOR,"Thanks. It is fixed in the source code, but we...",2011-05-18 11:23:08,1196367,MDEyOklzc3VlQ29tbWVudDExOTYzNjc=,2011-05-18 11:23:08,6,GaelVaroquaux,208217
9,COLLABORATOR,What happens if you comment out only the first...,2011-05-23 05:25:33,1219945,MDEyOklzc3VlQ29tbWVudDEyMTk5NDU=,2011-05-23 05:25:33,7,GaelVaroquaux,208217


In [30]:
comments_df['body'][4]

"@epatters: +1  @Snegovikufa: is QT_API a standard environment variable automatically set? If not it would be useful to add a comment in the documentation describing when to set it, at docs/source/mayavi/build_applications.rst, in the 'integrating_pyqt' node, next to the remark about pyside. "

### Read in `issues.tsv`

In [25]:
issues_df = pd.read_csv('issues.tsv',
                          sep='\t', index_col=0).sort_index()

In [26]:
issues_df.head(10)

Unnamed: 0,assignees,author_association,body,closed_at,comments,created_at,id,labels,locked,node_id,...,title,updated_at,project,organization,author_name,author_id,ticket_id,type,num_PR_created,num_issue_created
0,,NONE,,2018-10-11 07:08:17,0,2018-10-11 07:08:10,368981124,,False,MDU6SXNzdWUzNjg5ODExMjQ=,...,python3,2018-10-11 07:08:17,mayavi,enthought,icevoicey,30742101,725,issue,0,0
1,,MEMBER,If OSMesa is available and user requests an of...,2018-10-11 04:00:31,1,2018-10-11 03:49:59,368941830,,False,MDExOlB1bGxSZXF1ZXN0MjIxOTk4MzU5,...,Try and fix #477.,2018-10-11 04:00:34,mayavi,enthought,prabhuramachandran,272585,724,pull_request,93,7
2,,MEMBER,Creating a renderwindow in some configurations...,2018-10-09 19:36:26,1,2018-10-09 18:24:08,368337412,,False,MDExOlB1bGxSZXF1ZXN0MjIxNTM4Nzgw,...,Improve offscreen window creation.,2018-10-09 19:36:29,mayavi,enthought,prabhuramachandran,272585,723,pull_request,92,7
3,,NONE,This bug manifests when the SurfaceSource obje...,,2,2018-10-09 15:08:38,368259788,,False,MDExOlB1bGxSZXF1ZXN0MjIxNDc3Nzk0,...,Fix bug related to SurfaceSource.scalars,2018-10-09 15:48:02,mayavi,enthought,rahulporuri,1926457,722,pull_request,0,1
4,,NONE,"Hi, I am new to Mayavi. I have just installed ...",,1,2018-10-09 11:39:39,368167895,,False,MDU6SXNzdWUzNjgxNjc4OTU=,...,from mayavi import mlab not working,2018-10-10 21:33:01,mayavi,enthought,Love-Chrissie,31875095,721,issue,0,0
5,,NONE,"Hi, I have a custom app on top of Mayavi, tha...",2018-10-09 08:49:06,4,2018-10-03 09:41:53,366253893,,False,MDU6SXNzdWUzNjYyNTM4OTM=,...,How to use info.ui.dispose() with mayavi 4.6.2,2018-10-09 08:49:06,mayavi,enthought,rc,44282,720,issue,0,2
6,,NONE,"Hi, I'm getting this error when I try to impo...",,1,2018-09-29 19:32:04,365160630,,False,MDU6SXNzdWUzNjUxNjA2MzA=,...,TraitError,2018-10-10 21:34:28,mayavi,enthought,EhsanTadayon,4405049,719,issue,0,0
7,,NONE,"Hi, OSX 10.13.6 with prerequisites from conda,...",2018-09-24 14:14:37,2,2018-09-24 12:16:33,363120614,,False,MDU6SXNzdWUzNjMxMjA2MTQ=,...,Getting PyThreadState_Get: no current thread ...,2018-09-24 14:15:01,mayavi,enthought,gnurser,6864797,718,issue,0,2
8,,NONE,I don't know if anybody here gives the Linux d...,,1,2018-09-23 21:32:06,362979267,,False,MDU6SXNzdWUzNjI5NzkyNjc=,...,Upcoming Ubuntu 18.10 will still have Mayavi 4...,2018-09-23 22:14:25,mayavi,enthought,daver1691,8401803,717,issue,0,0
9,,MEMBER,,2018-09-23 19:51:02,1,2018-09-23 19:43:30,362971238,,False,MDExOlB1bGxSZXF1ZXN0MjE3NTI3MTY5,...,Fix #713.,2018-09-23 19:52:28,mayavi,enthought,prabhuramachandran,272585,716,pull_request,91,7
