-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
* Added a blog_parser role and fleshed out basic db_utils * Added a mock client for testing and built out some queries * Finished all queries and they actually work. * Updated unittests * Added myself as a user for tomfoolery
- Loading branch information
1 parent
ffb55f4
commit 38a4946
Showing
14 changed files
with
173 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
beautifulsoup4 >= 4.6.3 | ||
nltk >= 3.3 | ||
pre-commit==1.10.3 | ||
psycopg2 >= 2.7.5 | ||
redis >= 2.10.6 | ||
requests >= 2.19.1 | ||
validators |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
-- Deploy lexicount:role.blog_parser to pg | ||
|
||
BEGIN; | ||
|
||
SET ROLE sqitch; | ||
|
||
DO | ||
$do$ | ||
BEGIN | ||
IF NOT EXISTS ( | ||
SELECT * | ||
FROM pg_catalog.pg_roles | ||
WHERE rolname = 'blog_parser') THEN | ||
CREATE ROLE blog_parser LOGIN; | ||
END IF; | ||
END | ||
$do$; | ||
|
||
GRANT SELECT, INSERT, UPDATE ON public.blog_details to blog_parser; | ||
GRANT SELECT, INSERT, UPDATE ON public.word_details to blog_parser; | ||
|
||
COMMIT; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
-- Revert lexicount:role.blog_parser from pg | ||
|
||
BEGIN; | ||
|
||
SET ROLE sqitch; | ||
|
||
DO | ||
$do$ | ||
BEGIN | ||
IF NOT EXISTS ( | ||
SELECT * | ||
FROM pg_catalog.pg_roles | ||
WHERE rolname = 'blog_parser') THEN | ||
REVOKE ALL PRIVILEGES ON table "blog_details" from blog_parser; | ||
REVOKE ALL PRIVILEGES ON table "word_details" from blog_parser; | ||
DROP ROLE blog_parser; | ||
END IF; | ||
END | ||
$do$; | ||
|
||
COMMIT; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
-- Verify lexicount:role.blog_parser on pg | ||
|
||
BEGIN; | ||
|
||
SET ROLE blog_parser; | ||
|
||
SELECT has_table_privilege('public.blog_details', 'UPDATE'); | ||
SELECT has_table_privilege('public.blog_details', 'INSERT'); | ||
SELECT has_table_privilege('public.blog_details', 'SELECT'); | ||
|
||
SELECT has_table_privilege('public.word_details', 'UPDATE'); | ||
SELECT has_table_privilege('public.word_details', 'INSERT'); | ||
SELECT has_table_privilege('public.word_details', 'SELECT'); | ||
|
||
ROLLBACK; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
import psycopg2 as pg | ||
|
||
_WORD_DETAILS_TABLE = 'public.word_details' | ||
_BLOG_DETAILS_TABLE = 'public.blog_details' | ||
_PG_USER = 'blog_parser' | ||
_PG_HOST = '0.0.0.0' | ||
_PG_PORT = 5432 | ||
_PG_DB = 'lexicount' | ||
|
||
_WORD_UPDATE_QUERY = "INSERT INTO " + _WORD_DETAILS_TABLE + \ | ||
" (word, count, part_of_speech) " \ | ||
"VALUES ('{word}', 1, '{pos}') " \ | ||
"ON CONFLICT (word) DO UPDATE SET count = " + \ | ||
_WORD_DETAILS_TABLE + ".count + 1;" | ||
_BLOG_UPDATE_QUERY = "INSERT INTO " + _BLOG_DETAILS_TABLE + \ | ||
" (word, count, url) " \ | ||
"VALUES ('{word}', 1, '{url}')" \ | ||
"ON CONFLICT (word, url) DO UPDATE SET count = " + \ | ||
_BLOG_DETAILS_TABLE + ".count + 1;" | ||
_GET_WORD_COUNT_QUERY = 'SELECT COUNT(DISTINCT word) FROM word_details;' | ||
|
||
_db_conn = pg.connect(host=_PG_HOST, | ||
port=_PG_PORT, | ||
user=_PG_USER, | ||
database=_PG_DB) | ||
_db_cursor = _db_conn.cursor() | ||
|
||
|
||
def execute_query(query): | ||
""" | ||
Fetches a connection to our pg db | ||
""" | ||
_db_cursor.execute(query) | ||
try: | ||
result = _db_cursor.fetchall() | ||
except pg.ProgrammingError: | ||
result = None | ||
return result | ||
|
||
|
||
def update_word_details(word, pos): | ||
""" | ||
Given a word and a part of speech, we update the word_details table | ||
:param word: it's uh.. a word. Pulled from the blog post being parsed | ||
:param pos: part of speech as determined by NLTK | ||
""" | ||
execute_query(_WORD_UPDATE_QUERY.format(word=word, | ||
pos=pos)) | ||
|
||
|
||
def update_blog_details(word, url): | ||
""" | ||
Given a word and a url, we update the blog_details table | ||
:param word: yeah again.. it's a word | ||
:param url: blog's url | ||
""" | ||
execute_query(_BLOG_UPDATE_QUERY.format(word=word, | ||
url=url)) | ||
|
||
|
||
def get_unique_words(): | ||
""" | ||
Runs a COUNT DISTINCT on the word_details table | ||
""" | ||
return execute_query(_GET_WORD_COUNT_QUERY)[0][0] | ||
|
||
|
||
def close_db_connection(): | ||
_db_cursor.close() | ||
_db_conn.commit() | ||
_db_conn.close() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,5 @@ | ||
from redis import StrictRedis | ||
|
||
# Redis stuff | ||
WORD_DB_ID = 0 | ||
NLTK_DB_ID = 1 | ||
word_client = StrictRedis(db=WORD_DB_ID) | ||
nltk_client = StrictRedis(db=NLTK_DB_ID) | ||
word_client = StrictRedis() | ||
LINKS_KEY = 'blog_links' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
class MockPostgresCursor: | ||
def __init__(self, query_cache=set([])): | ||
self.query_cache = query_cache | ||
|
||
def execute(self, sql): | ||
print('Received the following sql: %s' % sql) | ||
self.query_cache.add(sql) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters