# Counting occurrences of words in a file

In this practice, we open a textual file and then process the words within the file. 

### Data File: 
File path: **`/dsa/data/all_datasets/hamilton-federalist-548.txt`**

Contains the text of Federalist paper written by Alexander Hamilton. 

### Algorithm

An algorithm is essentially a set of rules/steps that are followed in solving a problem. 

1. Open File
2. Parse file into a list of lists, variable named: `file_data`, where every line is a list of words.
3. Iterate through the file_data, and count two separate things:
   * Number of lines that contain the word `UNION` in any case
   * Number of instances of the word of `UNION` in any case

**Hints:** Check the [File Input and Output](../resources/FilesIO.ipynb) reference notebook for more syntax and help.
To read the file, we will be using the `with` control statment as described in Lesson M2-L1 on [Flow Control](../../Module2/labs/M2-L1-ControlStructures.ipynb#The-with-control-statement), as reviewed below.

```python
with open('test.file', 'r') as fh:  # here `test.file` can be replaced with any filepath 
                                    # & fh is a file handle object; you can give any names
    lines = fh.readlines()

for line in lines:
    print(line)
```

Also in [Strings and Dictionaries](../../Module1/labs/M1-L3-DS-Strings-Dictionaries.ipynb) we had learned the string method `.split` and in [Strings and Data Structures](../../Module1/practices/M1-P2-Data-Structures.ipynb) we used it on a string input with a `:` colon delimiter. This time we are splitting in between words, so we split on a space `' '`.

In [2]:
# Open file and parse into 
#  file_data in this cell
# ---------------------------------

#hint, file_data starts as an empty list
file_data = []

#hint, each item/cell in this list will contain a list of words we can do this by splitting each line 
# and adding the result to the list
with open('/dsa/data/all_datasets/hamilton-federalist-548.txt', 'r') as fh:
    lines = fh.readlines()


In [3]:
# Examine lines to figure what to do with it
print(type(lines))
print(lines[:10])
print("-" * 100)

filepath = "/dsa/data/all_datasets/hamilton-federalist-548.txt"
# We can loop over each line by 
with open(filepath, 'r') as fh:
    for line in fh:
        print(line)

<class 'list'>
['FEDERALIST. No. 1\n', '\n', 'General Introduction\n', 'For the Independent Journal.\n', '\n', 'HAMILTON\n', '\n', 'To the People of the State of New York:\n', 'AFTER an unequivocal experience of the inefficiency of the\n', ' subsisting federal government, you are called upon to deliberate on\n']
----------------------------------------------------------------------------------------------------
FEDERALIST. No. 1



General Introduction

For the Independent Journal.



HAMILTON



To the People of the State of New York:

AFTER an unequivocal experience of the inefficiency of the

 subsisting federal government, you are called upon to deliberate on

 a new Constitution for the United States of America. The subject

 speaks its own importance; comprehending in its consequences

 nothing less than the existence of the UNION, the safety and welfare

 of the parts of which it is composed, the fate of an empire in many

 respects the most interesting in the world. It has been


 which the States observe or disregard at their option.

It is a singular instance of the capriciousness of the human

 mind, that after all the admonitions we have had from experience on

 this head, there should still be found men who object to the new

 Constitution, for deviating from a principle which has been found

 the bane of the old, and which is in itself evidently incompatible

 with the idea of GOVERNMENT; a principle, in short, which, if it is

 to be executed at all, must substitute the violent and sanguinary

 agency of the sword to the mild influence of the magistracy.

There is nothing absurd or impracticable in the idea of a league

 or alliance between independent nations for certain defined purposes

 precisely stated in a treaty regulating all the details of time,

 place, circumstance, and quantity; leaving nothing to future

 discretion; and depending for its execution on the good faith of

 the parties. Compacts of this kind exist among all civilized

 nations


 riveting the chains of slavery upon a part of their countrymen,

 direct their course, but to the seat of the tyrants, who had

 meditated so foolish as well as so wicked a project, to crush them

 in their imagined intrenchments of power, and to make them an

 example of the just vengeance of an abused and incensed people? Is

 this the way in which usurpers stride to dominion over a numerous

 and enlightened nation? Do they begin by exciting the detestation

 of the very instruments of their intended usurpations? Do they

 usually commence their career by wanton and disgustful acts of

 power, calculated to answer no end, but to draw upon themselves

 universal hatred and execration? Are suppositions of this sort the

 sober admonitions of discerning patriots to a discerning people? Or

 are they the inflammatory ravings of incendiaries or distempered

 enthusiasts? If we were even to suppose the national rulers

 actuated by the most ungovernable ambition, it is impossible to

 b

punish piracies and felonies committed on the high seas, and

offenses against the law of nations; to regulate foreign

commerce, including a power to prohibit, after the year 1808, the

importation of slaves, and to lay an intermediate duty of ten

dollars per head, as a discouragement to such importations. This

class of powers forms an obvious and essential branch of the

federal administration. If we are to be one nation in any

respect, it clearly ought to be in respect to other nations. The

powers to make treaties and to send and receive ambassadors,

speak their own propriety. Both of them are comprised in the

articles of Confederation, with this difference only, that the 

former is disembarrassed, by the plan of the convention, of an

exception, under which treaties might be substantially frustrated

by regulations of the States; and that a power of appointing and

receiving ``other public ministers and consuls,'' is expressly

and very properly added to the former provision


the case of our slaves, when it views them in the mixed character

of persons and of property. This is in fact their true

character. It is the character bestowed on them by the laws

under which they live; and it will not be denied, that these are

the proper criterion; because it is only under the pretext that

the laws have transformed the negroes into subjects of property,

that a place is disputed them in the computation of numbers; and

it is admitted, that if the laws were to restore the rights which

have been taken away, the negroes could no longer be refused an

equal share of representation with the other inhabitants. ``This

question may be placed in another light. It is agreed on all

sides, that numbers are the best scale of wealth and taxation, as

they are the only proper scale of representation. Would the

convention have been impartial or consistent, if they had

rejected the slaves from the list of inhabitants, when the shares

of representation were to be calculate


 RECESS OF THE LEGISLATURE OF ANY STATE, the Executive THEREOF may

 make temporary appointments until the NEXT MEETING OF THE

 LEGISLATURE, which shall then fill such vacancies.'' Here is an

 express power given, in clear and unambiguous terms, to the State

 Executives, to fill casual vacancies in the Senate, by temporary

 appointments; which not only invalidates the supposition, that the

 clause before considered could have been intended to confer that

 power upon the President of the United States, but proves that this

 supposition, destitute as it is even of the merit of plausibility,

 must have originated in an intention to deceive the people, too

 palpable to be obscured by sophistry, too atrocious to be palliated

 by hypocrisy.

I have taken the pains to select this instance of

 misrepresentation, and to place it in a clear and strong light, as

 an unequivocal proof of the unwarrantable arts which are practiced

 to prevent a fair and impartial judgment of the real 


 vigor, and how improbable it is that any considerable portion of the

 bench, whether more or less numerous, should be in such a situation

 at the same time, we shall be ready to conclude that limitations of

 this sort have little to recommend them. In a republic, where

 fortunes are not affluent, and pensions not expedient, the

 dismission of men from stations in which they have served their

 country long and usefully, on which they depend for subsistence, and

 from which it will be too late to resort to any other occupation for

 a livelihood, ought to have some better apology to humanity than is

 to be found in the imaginary danger of a superannuated bench.

PUBLIUS.

1 Vide ``Constitution of Massachusetts,'' chapter 2, section

 I, article 13.



FEDERALIST No. 80

The Powers of the Judiciary

From McLEAN's Edition, New York.



HAMILTON



To the People of the State of New York:

To JUDGE with accuracy of the proper extent of the federal

 judicature, it will be necessary

**Your Turn**: See the comments below in the following cells. Write necessary code to accomplish tasks. Feel free to add more cells for inreased readability. 

In [4]:
# Task: Loop over lines and store the words as a list in file_data

# hint, file_data starts as an empty list. It is our container for keeping the file contents.
file_data = []

# Algorithm
# Step 1: open the federalist file again as shown above examples
# Step 2: loop over the lines
# We want to split on the ' ' or space so we can divide the them into an array of words

# ------------ Add your code below --------------
with open(filepath, 'r') as fh:
    # loop over the lines
    for line in fh:
        # we want to split on the ' ' or space so we can divide the them into an array of words
        file_data.append(line.split(' '))


# ------------ =================== --------------

print(file_data[12])  # print the 12th line of the document


['', 'nothing', 'less', 'than', 'the', 'existence', 'of', 'the', 'UNION,', 'the', 'safety', 'and', 'welfare\n']


In [None]:
# Test a method for detecting union in a line
'UNION,' in file_data[12]  # Note the comma. Just "UNION" without a comma will give False

How about the following? It's check whether the string 'UNION' is a substring of 'UNION,'. 

In [5]:
'UNION' in 'UNION,'

True

The above exercise was done for a sanity check: we can loop over the lines of a file, split the lines into words, and check whether a particular word (in the above case, `UNION,`) is present in a particular line (in the above case, 12). Now, can we build upon the above code to accomplish the following tasks?

* Loop through the line and split the line into word
* Check whether the word `union` is occurred or not in a line (notice the word is in lower case)
    * if there is a match, then count that line as one where this word has occurred
    * Also the number of times a word has occurred

In [6]:
# Iterate through file data and 
# compute your counts in this cell
# ---------------------------------

target_word = 'union'

line_count = 0  # counter for num of lines in which the target word occurred in the entire document
word_count = 0  # counter for num of times the target word occurred in the entire document

for line in file_data:
    
    this_line_count = 0  # contains the number of times the target word occurred in the current line
    
    # ------------ Add your code below --------------
    
    # Loop through the array of words in this 'line'
    for word in line:
        # check if this word is contains 'union'
        if target_word in word.lower():
            # if it's a match this_line_count++
            this_line_count += 1
    
    # at the end of the loop add this_line_count to word_count
    word_count += this_line_count
    
    # ------------ =================== --------------
    
    #if this_line_count isn't 0, line_count would increment by one    
    if this_line_count:
        line_count += 1
    # end if
    
# end for

print(f'#Lines: {line_count}; Words: {word_count}')


#Lines: 389; Words: 392


# Approximate Answers 

Your answers will vary based on your implementation and any extra steps taken.  The goal is the construction of the code and understanding of the file processing mechanics. Please do not stress over the finer points of case-insensitivity and punctuation at this point in time. You can revisit this later if you decide to tackle those challenges.

#### Using terminal

```bash
# Lines with any case spelling of UNION
$ grep -i union /dsa/data/all_datasets/hamilton-federalist-548.txt | wc -l
389

# Instances any case spelling of UNION
$ grep -i union /dsa/data/all_datasets/hamilton-federalist-548.txt  | sed -e 's/ /\n/g' |grep -i union | wc -l 
392
```

# SAVE YOUR NOTEBOOK