## New Notebook to Work on Filtering our Repositories (Owen)

New initial repo filtering terms needed, in order to have terms digital humanities and cultural analytics repos represented as well in hack, tact, and yack counts

In [51]:
import pandas as pd

### Descriptions in Digital Humanities Repositories

In [52]:
digital_humanities_2022_repo_data = pd.read_csv(
    "repo_data/digital_humanities/repos_searched_Digital+Humanities_2022.csv")
digital_humanities_2022_repo_data[['description']][:40]


Unnamed: 0,description
0,Materials for the Advanced Digital Humanities ...
1,Digital Humanities and the Climate Toolkit (dr...
2,Tutorials and lab files for 16:560:670:01 Digi...
3,"Practicing the Digital Humanities, UC Berkeley..."
4,数字人文教程代码合集
5,Software citation in the Digital Humanities
6,"🦊 Digital Humanities knowledge, projects, rela..."
7,UFO project for Digital Humanities
8,
9,


### Descriptions in Cultural Analytics Repositories

In [53]:
# Let's see what's in the descriptions for the cultural analytics repository
cultural_analytics_repo_data = pd.read_csv(
    "repo_data/cultural_analytics/repos_searched_Cultural+Analytics.csv")
cultural_analytics_repo_data[['description']][:40]

Unnamed: 0,description
0,"Introduction to Cultural Analytics & Python, c..."
1,Repository for Cultural Analytics (Fall 2019)
2,Code for the Oregon State University Course NM...
3,Articles from CA
4,Experiments and tutorials from the wide field ...
5,Computational study of change in religious sen...
6,Multiple formats of articles from the journal ...
7,Computational Cultural Analytics
8,Teaching with Cultural Analytics
9,Computational Cultural Analytics


### Descriptions in Digital Cultural Heritage Repositories

In [54]:
digital_cultural_heritage_repo_data = pd.read_csv(
    "repo_data/digital_cultural_heritage/repos_searched_Digital+Cultural+Heritage.csv")
digital_cultural_heritage_repo_data[['description']][:40]

Unnamed: 0,description
0,Notebooks for teaching Named Entity Recognitio...
1,Numishare is an open source suite of applicati...
2,A web application developed by Zepheira for th...
3,Environmentally Sustainable Digital Preservati...
4,eCHO: Enriching The Digital Representation of ...
5,Digital Access to Cultural Heritage file forma...
6,Omeka S is a web publishing platform for digit...
7,Digital Storytelling for Cultural Heritage
8,Create trustless and transparent Timestamps fo...
9,Portfolio of Virtual Restoration of Cultural H...


### Descriptions in Public History Repositories
We want to see what non-teaching repositories could accidentally be captured

In [64]:
public_history_repo_data = pd.read_csv(
    "repo_data/public_history/repos_searched_Public+History.csv")
public_history_repo_data[['description']][:40]

Unnamed: 0,description
0,Daily snapshots of public Spotify playlists
1,The Phantom Legion History
2,GrSecurity and PaX Patches Before End of Publi...
3,"Systematic coin price notifier, Telegram publi..."
4,"A chat server with OAuth2 authentication, pers..."
5,Public versions of WICED since 2.4.0
6,History of Tizen. This project is released in...
7,View the history of public and world readable ...
8,FamiTracker public source code history
9,微信公众号历史文章、点赞数及阅读数的爬取


## Establishing New Pedagogy Terms

In [56]:
new_pedagogy_terms = [
    'projects',
    'project',
    'tutorials',
    'tutorial', 
    'school',
    'course',
    'training',
    'materials',
    'intro',
    'introduction',
    'intro to',
    'practice',
    'practicing',
    'toolkit',
    'term',
    'Experiments',
    'University',
    'articles',
    'study',
    'textbook',
    'analysis',
    'teaching',
    'computational study',
    'comparative',
    'access',
    'digital history',
    'digital art history',
    'digital storytelling',
    'digital approaches',
    'preserving',
    'portfolio',
    'virtual restoration',
    'scripts',
    'exercise',
    'workshop',
    'instructional',
    'assignment',   
] 
# potential avoid words 'public', 'record', 'inform'

### Looping Through Digital Humanities

In [57]:
digital_humanities_2022_repo_data["lower_description"] = digital_humanities_2022_repo_data["description"].str.lower()
digital_humanities_2022_repo_data["lower_name"] = digital_humanities_2022_repo_data["name"].str.lower()

In [58]:
dfs_digital_humanities_2022 = []
for term in new_pedagogy_terms:
    selected_rows = digital_humanities_2022_repo_data[(digital_humanities_2022_repo_data["lower_description"].str.contains(term)) | (digital_humanities_2022_repo_data["lower_name"].str.contains(term))]
    print(term, len(selected_rows))
    dfs_digital_humanities_2022.append(selected_rows)

projects 9
project 60
tutorials 2
tutorial 3
school 6
course 29
training 0
materials 3
intro 15
introduction 10
intro to 3
practice 2
practicing 1
toolkit 1
term 4
Experiments 0
University 0
articles 0
study 2
textbook 0
analysis 8
teaching 0
computational study 0
comparative 0
access 1
digital history 0
digital art history 0
digital storytelling 2
digital approaches 0
preserving 0
portfolio 2
virtual restoration 0
scripts 2
exercise 0
workshop 11
instructional 0
assignment 2


### Looping Through Cultural Analytics

In [59]:
cultural_analytics_repo_data["lower_description"] = cultural_analytics_repo_data["description"].str.lower()
cultural_analytics_repo_data["lower_name"] = cultural_analytics_repo_data["name"].str.lower()


In [60]:
dfs_cultural_analytics = []
for term in new_pedagogy_terms:
    selected_rows = cultural_analytics_repo_data[(cultural_analytics_repo_data["lower_description"].str.contains(term)) | (cultural_analytics_repo_data["lower_name"].str.contains(term))]
    print(term, len(selected_rows))
    dfs_cultural_analytics.append(selected_rows)

projects 5
project 17
tutorials 1
tutorial 1
school 0
course 19
training 2
materials 1
intro 7
introduction 3
intro to 1
practice 1
practicing 0
toolkit 0
term 2
Experiments 0
University 0
articles 3
study 2
textbook 2
analysis 15
teaching 1
computational study 1
comparative 1
access 3
digital history 0
digital art history 0
digital storytelling 0
digital approaches 0
preserving 0
portfolio 11
virtual restoration 0
scripts 2
exercise 0
workshop 0
instructional 0
assignment 13


### Looping Through Digital Cultural Heritage

In [65]:
digital_cultural_heritage_repo_data["lower_description"] = digital_cultural_heritage_repo_data["description"].str.lower()
digital_cultural_heritage_repo_data["lower_name"] = digital_cultural_heritage_repo_data["name"].str.lower()


In [66]:
dfs_digital_cultural_heritage = []
for term in new_pedagogy_terms:
    selected_rows = digital_cultural_heritage_repo_data[(digital_cultural_heritage_repo_data["lower_description"].str.contains(term)) | (digital_cultural_heritage_repo_data["lower_name"].str.contains(term))]
    print(term, len(selected_rows))
    dfs_digital_cultural_heritage.append(selected_rows)

projects 1
project 6
tutorials 0
tutorial 0
school 1
course 3
training 0
materials 0
intro 0
introduction 0
intro to 0
practice 1
practicing 0
toolkit 0
term 0
Experiments 0
University 0
articles 0
study 0
textbook 0
analysis 1
teaching 1
computational study 0
comparative 0
access 4
digital history 0
digital art history 1
digital storytelling 1
digital approaches 1
preserving 1
portfolio 1
virtual restoration 1
scripts 1
exercise 0
workshop 1
instructional 0
assignment 0


### Looping Through Public History

In [67]:
public_history_repo_data["lower_description"] = public_history_repo_data["description"].str.lower()
public_history_repo_data["lower_name"] = public_history_repo_data["name"].str.lower()

In [68]:
dfs_public_history = []
for term in new_pedagogy_terms:
    selected_rows = public_history_repo_data[(public_history_repo_data["lower_description"].str.contains(term)) | (public_history_repo_data["lower_name"].str.contains(term))]
    print(term, len(selected_rows))
    dfs_public_history.append(selected_rows)

projects 39
project 135
tutorials 2
tutorial 6
school 18
course 28
training 31
materials 7
intro 28
introduction 12
intro to 0
practice 12
practicing 0
toolkit 3
term 60
Experiments 0
University 0
articles 9
study 17
textbook 1
analysis 31
teaching 4
computational study 0
comparative 0
access 49
digital history 0
digital art history 0
digital storytelling 0
digital approaches 0
preserving 2
portfolio 10
virtual restoration 0
scripts 10
exercise 7
workshop 0
instructional 1
assignment 19
