# Parsing raw data from The Pile

## Processing data files

1. Import modules to use: `jsonlines` and `pandas`

In [18]:
# -*- coding: utf-8 -*-
import jsonlines
import pandas as pd
from collections import namedtuple
from pathlib import Path

2. Declare constants:
    - desired data sets from The Pile
    - path to raw data file

In [19]:
bookCorp = ('Gutenberg (PG-19)', 'Books3', 'BookCorpus2')
webCorp = ('Pile-CC', 'OpenWebText2')
selectCorp = bookCorp + webCorp

pile_data_path = Path('pile_data/test-main.jsonl')

_just a helper function get an overview of the data frame_

In [20]:
def assess_df(df: pd.DataFrame): 
    print(df.info())
    print('---')
    print(df.describe())
    print('---\n(Memory Usage)')
    print(df.memory_usage(deep=True))

_Function to discard texts with non-English characters. From (this stackoverflow answer)[https://stackoverflow.com/a/27084708]_

In [21]:
def isEnglish(s):
    try:
        s.encode(encoding='utf-8').decode('ascii')
    except UnicodeDecodeError:
        return False
    else:
        return True

# assert not isEnglish('slabiky, ale liší se podle významu')
# assert isEnglish('English')
# assert not isEnglish('ގެ ފުރަތަމަ ދެ އަކުރު ކަ')
# assert not isEnglish('how about this one : 通 asfަ')
# assert isEnglish('?fd4))45s&')

3. define namedtuple to simplify dataframe creation from json object

In [22]:
text_info = namedtuple('Text', ['text', 'pile_set_name'])

4. Load the (sample) jsonlines formatted (`.jsonl`) file using `jsonlines`.
5. Create a generator object which directly filters out texts from unwanted data sets.
6. Use pandas to create a flattened dataframe from the generator.

In [23]:
with pile_data_path.open(encoding='utf-8-sig', mode='r') as jlf:
    jlines = jsonlines.Reader(jlf).iter()
    texts = (text_info(d['text'], d['meta']['pile_set_name']) 
             for d in jlines if d['meta']['pile_set_name'] in selectCorp)
    # This has to be done before the file is closed: 
    #   Since we're using a generator to speed things up, the data is not fully 
    #   loaded into the workspace until it's put into the dataframe.
    df = pd.DataFrame(texts)

# since this is just illustrative, go ahead and trim
df = df.sample(int(len(df)/2))
assess_df(df)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 43007 entries, 4577 to 4387
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   text           43007 non-null  object
 1   pile_set_name  43007 non-null  object
dtypes: object(2)
memory usage: 1008.0+ KB
None
---
         text pile_set_name
count   43007         43007
unique  43005             5
top                 Pile-CC
freq        2         26326
---
(Memory Usage)
Index               344056
text             478020972
pile_set_name      2835312
dtype: int64


7. Clean it up a bit, and remove duplicate text items, and throw out texts containing non English characters. 

In [24]:
df = df.drop_duplicates(subset='text').reset_index(drop=True)

df = df.assign(pile_set_name=df.pile_set_name.astype('category'),
               text = df.text.astype('string'))
altdf = df[[not isEnglish(t) for t in df.text]]
engdf = df[[isEnglish(t) for t in df.text]]
assess_df(engdf)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14539 entries, 0 to 42999
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   text           14539 non-null  string  
 1   pile_set_name  14539 non-null  category
dtypes: category(1), string(1)
memory usage: 241.6 KB
None
---
                                                     text pile_set_name
count                                               14539         14539
unique                                              14539             5
top     My housemates appear to have just left without...       Pile-CC
freq                                                    1         10105
---
(Memory Usage)
Index              116312
text             49194224
pile_set_name       15049
dtype: int64


In [25]:
assess_df(altdf)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 28466 entries, 1 to 43004
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   text           28466 non-null  string  
 1   pile_set_name  28466 non-null  category
dtypes: category(1), string(1)
memory usage: 472.8 KB
None
---
                                                     text pile_set_name
count                                               28466         28466
unique                                              28466             5
top     I’m overly cautious about download managers, e...       Pile-CC
freq                                                    1         16221
---
(Memory Usage)
Index               227728
text             644854732
pile_set_name        28976
dtype: int64


In [26]:
sample_nonlatin = altdf.sample(5)
# sample_nonlatin.to_csv('original_nonlatin.csv')
print(sample_nonlatin.iat[0,0][0:800])

About This Game

Red Baron Manual. This digital scan of the original Red Baron manual contains a wealth of information, including detailed history on World War I along with its airplanes, aces and flight strategies.



This digital scan of the original Red Baron manual contains a wealth of information, including detailed history on World War I along with its airplanes, aces and flight strategies. Red Baron Maps. The historically accurate World War I maps originally bundled with Red Baron are included as digital scans.



The historically accurate World War I maps originally bundled with Red Baron are included as digital scans. Red Baron Reference Cards. The original reference cards accompanying Red Baron are included as digital scans.

Red Baron 3D Manual. Like its original counterpart, this digital scan of the Red Baron 3D manual is filled with game tips, strategies and World War I history.



Like its original counterpart, this digital scan of the Red Baron 3D manual is filled with g

In [27]:
from unidecode import unidecode
translated_nonlatin = sample_nonlatin.assign(text = sample_nonlatin.text.apply(unidecode))
# translated_nonlatin.to_csv('translated_nonlatin.csv')
print(translated_nonlatin.iat[0,0][0:800])

About This Game

Red Baron Manual. This digital scan of the original Red Baron manual contains a wealth of information, including detailed history on World War I along with its airplanes, aces and flight strategies.



This digital scan of the original Red Baron manual contains a wealth of information, including detailed history on World War I along with its airplanes, aces and flight strategies. Red Baron Maps. The historically accurate World War I maps originally bundled with Red Baron are included as digital scans.



The historically accurate World War I maps originally bundled with Red Baron are included as digital scans. Red Baron Reference Cards. The original reference cards accompanying Red Baron are included as digital scans.

Red Baron 3D Manual. Like its original counterpart, this digital scan of the Red Baron 3D manual is filled with game tips, strategies and World War I history.



Like its original counterpart, this digital scan of the Red Baron 3D manual is filled with g

In [28]:
df.loc[:,'text'] = df.text.apply(unidecode)

8. Create codes for data subsets

In [29]:
subset_abbr_dict = {'Gutenberg (PG-19)': 'PG19',
                    'Books3': 'Bks3',
                    'BookCorpus2': 'BkC2',
                    'Pile-CC': 'PiCC',
                    'OpenWebText2': 'OWT2'}
codes = (subset_abbr_dict[n] for n in df.pile_set_name)
df = df.assign(pile_set_code=pd.Categorical(codes))

9. Create text ids from raw file name, pile subset code, and dataframe index.

In [30]:
datastem = pile_data_path.stem
codedf = pd.DataFrame()
for code in df.pile_set_code.unique(): 
    subdf = df.loc[df.pile_set_code == code, :].reset_index()
    prefs = code + '_' + datastem + '_'  
    idnums = subdf.index.astype('string')
    width = len(str(df.index.max()))
    idnums = idnums.str.zfill(width)
    subdf = subdf.assign(text_id= prefs + idnums)
    
    codedf = pd.concat([codedf, subdf])

df = codedf
df.text_id

0     PiCC_test-main_00000
1     PiCC_test-main_00001
2     PiCC_test-main_00002
3     PiCC_test-main_00003
4     PiCC_test-main_00004
              ...         
9     BkC2_test-main_00009
10    BkC2_test-main_00010
11    BkC2_test-main_00011
12    BkC2_test-main_00012
13    BkC2_test-main_00013
Name: text_id, Length: 43005, dtype: object

A sample: 

In [31]:
pd.set_option('display.max_colwidth', 200)
df = df[['text_id', 'text', 'pile_set_name', 'pile_set_code']]
df.loc[:, ['text_id', 'text']] = df.loc[:,['text_id', 'text']].astype('string')
df.sample(20)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, val, pi)


Unnamed: 0,text_id,text,pile_set_name,pile_set_code
13866,OWT2_test-main_13866,"This transgender mania began with the sudden obsession and mass glorification of a celebrity who decided to switch genders. You may remember the orgiastic reaction by the leftist media: ""Bruce Jen...",OpenWebText2,OWT2
2810,PiCC_test-main_02810,"The Zoning Plan The Zoning Plan (POT) for Bogota, a city with 7.9 million inhabitants, is drafted and adopted The Zoning Plan is a technical and regulatory instrument used to organize the munici...",Pile-CC,PiCC
8971,OWT2_test-main_08971,"Jing Ji 3058 [(ITDao Ru Bu Zhu Jin no3Ci Gong Mu gaKai Shi sareteorimasu!)] Qi Ye Zhi Yuan puratsutohuomudeha, Jie Yang noShi Ye niYi Li tsuBu Zhu Jin wohazimetosuruYou Yi naZui Xin Qing Bao woF...",OpenWebText2,OWT2
17342,PiCC_test-main_17342,Description TheRoland RMIDI B10 Mk2 3m is a highly durable and high-end MIDI cable and is a great quality professional Cable suitable for both on stage and in the studio. The Roland RMIDI B10 Mid...,Pile-CC,PiCC
21877,PiCC_test-main_21877,"PALO ALTO, Calif. -- Living the fantasy of every homeowner who's faced the prospect of a nuisance project next door, Facebook's Mark Zuckerberg has bought four homes adjacent to his own 5-bedroom ...",Pile-CC,PiCC
26055,PiCC_test-main_26055,"Vincent Diepeveen wrote: > Bzip2, gzip, >> Why do you guys keep quoting those total outdated compressors :) Path of least resistance, not to mention python bindings. > there is 7-zip for linux, it...",Pile-CC,PiCC
24825,PiCC_test-main_24825,"Inside The Bills Before 2009 toxic differential was an unknown metric. Devised by former Baltimore Ravens head coach Brian Billick, it takes turnover differential and big play differential and ad...",Pile-CC,PiCC
16149,OWT2_test-main_16149,"Insomniac is quickly preparing everyone for their newest event, EDC Mexico. It was an early Christmas when they announced that they were taking EDC to Mexico this coming up March and now they just...",OpenWebText2,OWT2
13994,OWT2_test-main_13994,McLaren racing director Eric Boullier is hoping the team's partnership with Renault will yield at least one race win next season. The Woking-based outfit begins a new chapter in its existence aft...,OpenWebText2,OWT2
538,PiCC_test-main_00538,I am sitting still with the Full August Moon at my back. She is high in the sky and refuses to yield to darkness for the the annual Perseid meteor shower. I am undaunted. I know the light from the...,Pile-CC,PiCC


In [32]:
problem = df[df.text_id == 'OWT2_test_11913']
print(problem.text)
isEnglish(str(problem.text))
# (now translated into ascii via unidecode())

Series([], Name: text, dtype: string)


True

Just to sanity check: the only texts kept from the raw input file are in the desired pile data subsets.

In [33]:
df.pile_set_name.unique().to_list()

['Pile-CC', 'OpenWebText2', 'Gutenberg (PG-19)', 'Books3', 'BookCorpus2']

Split the filtered data frame up into separate dataframes for each of the specified categories.

In [34]:
web_select = df[df.pile_set_name.isin(webCorp)]
web_select

Unnamed: 0,text_id,text,pile_set_name,pile_set_code
0,PiCC_test-main_00000,"My housemates appear to have just left without saying goodbye so I'm turning to my other source of companionship, my blog. Having finally done most of the chores I need to do today I feel I can st...",Pile-CC,PiCC
1,PiCC_test-main_00001,"SYRACUSE, UT - Onions 52 had a whirlwind 2017, launching a company-wide rebrand in the last few months of the year that saw a new look, a new name, and so much more. Prepped with a brand that had ...",Pile-CC,PiCC
2,PiCC_test-main_00002,"Archive ananyo writes ""The Norfolk Constabulary has closed its investigation into the November 2009 release of private emails between researchers at the Climatic Research Centre at the University...",Pile-CC,PiCC
3,PiCC_test-main_00003,"China, a dirty game player! Go to page MEMBER Joined Jan 4, 2015 Messages 206 Reaction score 11 Country Location China is overly imposing its superpower capability with its waging terri...",Pile-CC,PiCC
4,PiCC_test-main_00004,Debugging etcd Migration etcd gateway What is etcd gateway etcd gateway is a simple TCP proxy that forwards network data to the etcd cluster. The gateway is stateless and transparent; it neith...,Pile-CC,PiCC
...,...,...,...,...
16509,OWT2_test-main_16509,"The Daily Mail has published a rubbish piece by Michael Howard, former leader of Britain's Conservative party, attacking Donald Trump, claiming that man-made global warming is real and that Margar...",OpenWebText2,OWT2
16510,OWT2_test-main_16510,Toronto Body of woman with 'obvious signs of trauma' found in North York park Share on Facebook Share on Twitter Share by Email Police are looking to speak to anyone who visited Derrydowns Park...,OpenWebText2,OWT2
16511,OWT2_test-main_16511,"Chinese filmmakers used to have a hard time finding financing; now investors can't wait to get into show business. The film sector is one of few boom industries in China, with an annual growth ra...",OpenWebText2,OWT2
16512,OWT2_test-main_16512,"DAGUPAN CITY -- A 70-year-old American was shot dead on Saturday in San Jacinto town in Pangasinan province. Danny Blaylock, a retired United States Navy soldier, was walking home in Barangay (vi...",OpenWebText2,OWT2


In [35]:
book_select = df[df.pile_set_name.isin(bookCorp)]
book_select

Unnamed: 0,text_id,text,pile_set_name,pile_set_code
0,PG19_test-main_00000,"JUST DAVID BY ELEANOR H. (HODGMAN) PORTER AUTHOR POLLYANNA, MISS BILLY MARRIED, ETC.  TO  MY FRIEND  Mrs. James Harness CONTENTS  I. THE MOUNTAIN HOME  II. TH...",Gutenberg (PG-19),PG19
1,PG19_test-main_00001,E-text prepared by Larry B. Harrison and the Project Gutenberg Online Distributed Proofreading Team (http://www.pgdp.net) Note: Project Gutenberg also has an HTML version of this  file w...,Gutenberg (PG-19),PG19
2,PG19_test-main_00002,"Produced by David Widger THE GREAT AMERICAN FRAUD By Samuel Hopkins Adams A Series of Articles on the Patent Medicine Evil, Reprinted from Collier's Weekly  I-----The Great Ame...",Gutenberg (PG-19),PG19
3,PG19_test-main_00003,"Produced by D.R. Thompson HISTORY OF FRIEDRICH II. OF PRUSSIA FREDERICK THE GREAT By Thomas Carlyle APPENDIX. This Piece, it would seem, was translated sixteen years ago; some four...",Gutenberg (PG-19),PG19
4,PG19_test-main_00004,Produced by Martin Robb. For the Temple: A Tale of the Fall of Jerusalem By G. A. Henty. Contents Preface. Chapter 1: The Lake Of Tiberias. Chapter 2: A Storm On Galilee. Chapter 3: ...,Gutenberg (PG-19),PG19
...,...,...,...,...
9,BkC2_test-main_00009,Life Sciences and Chemical Patent Practice in Canada: A Practical Guide Third Edition Smashwords Edition 2011 Borden Ladner Gervais LLP Canada Life Sciences and Chemical Patent Practice in...,BookCorpus2,BkC2
10,BkC2_test-main_00010,"The Wings of a Broken Bird: By Christine Wood Copyright (c) by C Wood 2020 Any resemblance, to people, events, and places, Written within the pages of this book, is purely coincidental. As...",BookCorpus2,BkC2
11,BkC2_test-main_00011,Diary of a FRENCH academic (Summer 2010) Jean-Philippe DENIS Copyright Jean-Philippe DENIS 2010 With Smashwords Edition License Notes This ebook is licensed for your personal enjoyment onl...,BookCorpus2,BkC2
12,BkC2_test-main_00012,### Visits From Beyond ### True Stories of After Death Encounters ### C.A. Starfire **Second Smashwords Edition July 2012** (c) 2012 C.A. Starfire This eBook is licensed for your personal e...,BookCorpus2,BkC2


## Parsing Cleaned Data

It looks like the book corpora cases are going to: 
    1. be much longer text strings (probably harder on memory resources for parsing)
    2. have funkier formatting

So, let's back up and just do the web corpora, and for simplicity, just `Pile-CC` texts for now.

The `Pile-CC` data can be pulled from the `web_select` dataframe: 

In [36]:
# pcc = web_select[web_select.pile_set_code == 'PiCC']
# temporary intermediate save to speed up debugging
# pcc.to_pickle('pcc_table.pkl.gz', compression='gzip')
from pathlib import Path
import pandas as pd
pcc = pd.read_pickle('pcc_table.pkl.gz', compression='gzip')
pcc

Unnamed: 0,text_id,text,pile_set_name,pile_set_code
0,PiCC_test_00000,Mud Hens pitcher Evan Reed charged with sexual assault Mud Hens pitcher Evan Reed was charged July 30 with sexual assault related to a March incident in Detroit when he was a member of the Detroi...,Pile-CC,PiCC
1,PiCC_test_00001,"I'm getting about the same thing trying to update ""tf"" (team fortress 2) on Ubuntu 7.10 (just updated it yesterday). DSL connection near Seattle, WA. Come to think of it, might have been ""Connecti...",Pile-CC,PiCC
2,PiCC_test_00002,"Mounting tensions with Syria sink US stocks NEW YORK (AP) -- Fears of an escalating conflict in Syria rippled across financial markets on Tuesday, sinking stocks, lifting gold and pushing the pri...",Pile-CC,PiCC
3,PiCC_test_00003,"Upcoming Events Catholic Theologians Call to Abolish the Death Penalty In the wake of the September 21st executions of Troy Anthony Davis in Georgia and Lawrence Brewer in Texas, over 350 Cathol...",Pile-CC,PiCC
4,PiCC_test_00004,"Tag Archives: west texas Post navigation In the summer of 1980, if I remember right, we traveled from Kansas to northern Arkansas to visit my Dad's older brother, Uncle Don. He, my Aunt Mary and...",Pile-CC,PiCC
...,...,...,...,...
52785,PiCC_test_52785,"These are all examples of street harassment. It's a serious problem, and yet it happens every day in every city around the globe. Sometimes people don't recognize the severity of the issue. Men mi...",Pile-CC,PiCC
52786,PiCC_test_52786,"There is an online service called fiverr. fiverr lets you hire professionals in all different industries for a low price. The turn around times are a bit longer, and you only get so many revisions...",Pile-CC,PiCC
52787,PiCC_test_52787,Milton Friedman had no idea that his six-day trip to Chile in March 1975 would generate so much controversy. He was invited to Santiago by a group of Chilean economists who over the previous decad...,Pile-CC,PiCC
52788,PiCC_test_52788,"Age may not hurt it as said. What will hurt you tring to sell it, is the fact that no one will ship that package anymore. Nada, Zip, fineto. Must be done face to face. Hope that helps you all out ...",Pile-CC,PiCC


In [37]:
pcc.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 52790 entries, 0 to 52789
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   text_id        52790 non-null  string  
 1   text           52790 non-null  string  
 2   pile_set_name  52790 non-null  category
 3   pile_set_code  52790 non-null  category
dtypes: category(2), string(2)
memory usage: 1.3 MB


Now we can iterate over the `text` column series and parse each text. (After importing the necessary modules, of course.)

*Note that the stanford-corenlp needs to be installed. I had trouble with using `wget` for the "latest version" and the 
`git-lfs` steps suggested on the github (returned empty zips), so I wound
up manually downloading from (Hugging Face)[https://huggingface.co/stanfordnlp/CoreNLP/blob/main/stanford-corenlp-latest.zip] and unzipping.*

In [38]:
from nltk.parse.corenlp import CoreNLPServer, CoreNLPDependencyParser


Then we need to set up the server and tell it where the following files are:
- `stanford-corenlp-X.X.X.jar`
- `stanford-corenlp-X.X.X-models.jar`

If you downloaded and unzipped in your home directory, the path(s) can be set like this.

In [39]:
stanford_dir_path = Path.home().joinpath('stanford-corenlp-4.3.1')
base_path = stanford_dir_path.joinpath(f'{stanford_dir_path.name}.jar')
models_path = stanford_dir_path.joinpath(f'{stanford_dir_path.name}-models.jar')
if not (base_path.exists() and models_path.exists()): 
    print('required jar files not found!!')
server = CoreNLPServer(str(base_path), str(models_path), port=9001)


In [40]:
import os

Try to start the server...

In [41]:
server.start()
os.system(f'java -mx4g -cp "{stanford_dir_path.joinpath("*")}" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9001 -timeout 15000')

CoreNLPServerError: Could not connect to the server.

hmmm this doesn't work because the command line tool needs a txt file input... consider spaCy? 

In [None]:


for text in pcc.text[0:5]: 
    parse_command = (f'echo "{text}" | java -cp "{stanford_dir_path}*" edu.stanford.nlp.pipeline.StanfordCoreNLP '
                 f'-annotators tokenize,ssplit,pos,depparse -file {text} -outputFormat conll -output.includeText True -output.columns idx,word,lemma,pos,ner,headidx,deprel')
    # dep_text = text # TODO : add parsing steps here
    os.system(parse_command)

Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLP
Caused by: java.lang.ClassNotFoundException: edu.stanford.nlp.pipeline.StanfordCoreNLP
sh: 24: Mud: not found
sh: 26: Evan: not found
sh: 28: Reed,: not found
sh: 43: Syntax error: Unterminated quoted string
sh: 5: Syntax error: "(" unexpected
sh: 13: The law of unintended consequences and the history of previous military interventions in the region is not a recipe for political and economic stability,: not found
sh: 15: P: not found
sh: 15: The: not found
sh: 19: While: not found
sh: 21: People worry about this becoming a worst-case scenario and turning into a regional conflict,: not found


Mounting tensions with Syria sink US stocks

NEW YORK (AP) -- Fears of an escalating conflict in Syria rippled across financial markets on Tuesday, sinking stocks, lifting gold and pushing the price of oil to the highest in a year and a half.

The increasing possibility of U.S. military strikes raised worries on Wall Street that energy trade in the region could be disrupted, raising fuel costs for consumers and business.

If Syria becomes drawn out and becomes a long-term issue, its going to show up in things like gas prices," said Chris Costanzo, investment officer with Tanglewood Wealth Management.

The Dow Jones industrial average fell 170.33 points, or 1.1 percent, to 14,776.13, the lowest in two months.

The Standard & Poors 500 index lost 26.30 points, or 1.6 percent, to 1,630.48 and the Nasdaq composite fell 79.05 points, or 2.2 percent, to 3,578.52.


sh: 23: Energy: not found
sh: 29: J.C.: not found
sh: 47: The law of unintended consequences and the history of previous military interventions in the region is not a recipe for political and economic stability,: not found
sh: 49: P: not found
sh: 49: The: not found
sh: 53: While: not found
sh: 55: People worry about this becoming a worst-case scenario and turning into a regional conflict,: not found
sh: 57: Energy: not found
sh: 63: J.C.: not found
Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLP
Caused by: java.lang.ClassNotFoundException: edu.stanford.nlp.pipeline.StanfordCoreNLP
sh: 9: Catholic: not found
sh: 13: Syntax error: Unterminated quoted string
sh: 29: Richard: not found
sh: 33: Ive often wondered how to best describe Bradbury as a writer to those unfamiliar with his work (my two favorite Bradbury books are The Illustrated Man and Fahrenheit 451). Perhaps Stephen King said it best: with Mr. Bradbury, everythings: not found
sh: 35: The Joy 

Tag Archives: west texas

Post navigation

In the summer of 1980, if I remember right, we traveled from Kansas to northern Arkansas to visit my Dad's older brother, Uncle Don. He, my Aunt Mary and my cousins lived in Harrison, near Dogpatch. I also remember something about getting some Cavender seasoning, since it's made in Harrison. (I still use it today, although I prefer the salt-free form).

As we traveled to Harrison, my seven-year-old mind seemed to record us being on some sort of mountainous hill. One road went to Harrison while another road seemed to lead to another town down in a distant valley. A look at a map reveals it might've been Omaha, Arkansas.

To this day, 32 years later, I still wonder about that town. What was its name? What secrets did it hold? What stories did it tell? Or, did it exist solely in my imagination?

I have resurrected and transported the town approximately 800 miles southwest into West Texas in a short story I am working on, titled Garth, Texas. In t

sh: 49: Up: not found
sh: 51: Richard: not found
sh: 59: To: not found
sh: 61: I: not found
sh: 61: what: not found
sh: 63: But: not found
sh: 65: Forget: not found
sh: 77: I: not found
sh: 83: Syntax error: "(" unexpected
