# Count the number of lines in Python for each file

## 1) Command Line

In [2]:
! ls -1 data

bad_date.csv
bookings.csv.bz2
bookings.sample.csv.bz2
bookings.sample.csv.csv
new_sample.bookings.csv
new_sample.bookings.csv.bz2
searches.clean.csv.bz2
searches.clean.no_dupl.csv
searches.csv.bz2
searches.sample.csv.bz2
searches.sample.csv.csv
top_airports2.csv
top_airports.csv


In [4]:
!bzcat ./data/bookings.sample.csv.bz2 | wc -l

10000


## 2) Python:

#### 2a) Python without uncompressing

In [6]:
import bz2

In [12]:
fileBz2=bz2.BZ2File('./data/bookings.sample.csv.bz2')

In [13]:
k=0
for line in fileBz2:
    k+=1
print "%s has %s lines."%(fileBz2.name,k)

./data/bookings.sample.csv.bz2 has 10000 lines.


#### 2b) Python on row uncompressed file

In [14]:
FileName='./data/bookings.sample.csv.csv'

In [15]:
with open(FileName, "r") as file_input:
    k=0
    for line in file_input:
        k+=1
print "%s has %s lines."%(FileName,k)

./data/bookings.sample.csv.csv has 10000 lines.


In [16]:
with open(FileName, "r") as file_input:
    for k, line in enumerate(file_input):
        pass
print "%s has %s lines."%(FileName,k+1)

./data/bookings.sample.csv.csv has 10000 lines.


In [17]:
with open(FileName, "r") as file_input:
    row_count = sum([1 for row in file_input])
print "%s has %s lines."%(FileName,row_count)

./data/bookings.sample.csv.csv has 10000 lines.


## 3) What if the file didnt exist? Use Try-except...

In [69]:
FileName='ch_df01.ipynb'

In [71]:
try:
    with open(FileName, "r") as file_input:
        for k, line in enumerate(file_input):
            pass
    print "%s has %s lines."%(FileName,k+1)
except IOError:
    print "Error! File %s did not open!" %(FileName)

Error! File ch_df01.ipynb did not open!


### 3) Did he mean each csv file? Find file size for all csv files in the path... Use glob library

In [65]:
def number_of_line_csv(filename):
    with open(filename, "r") as file_input:
        for k, line in enumerate(file_input):
            pass
    return k+1

In [66]:
import glob
files_to_read = glob.glob("*.ipynb")
for file_name in files_to_read:
    print "number of lines in %s : %d"% (file_name, number_of_line_csv(file_name))

number of lines in Web services solutions.ipynb : 244
number of lines in Solution for exercise 2.ipynb : 209
number of lines in Solution for challenge.ipynb : 643
number of lines in ch_01.ipynb : 533
number of lines in Solution for exercise 3.ipynb : 194
number of lines in Solution for consume web service.ipynb : 443


In [49]:
import glob
print glob.glob("")
print glob.glob("*")
print glob.glob("*.bz2")

[]
['Web services solutions.ipynb', 'Solution for exercise 2.ipynb', 'searches.csv.bz2', 'Solution for challenge.ipynb', 'ch_01.ipynb', 'bookings.csv.bz2', 'Solution for exercise 3.ipynb', 'Solution for consume web service.ipynb']
['searches.csv.bz2', 'bookings.csv.bz2']


https://docs.python.org/2/library/glob.html

The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. No tilde expansion is done, but *, ?, and character ranges expressed with [] will be correctly matched. 

For a literal match, wrap the meta-characters in brackets. For example, '[?]' matches the character '?'.

glob.glob(pathname)
Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can be either absolute (like /usr/src/Python-1.5/Makefile) or relative (like ../../Tools/*/*.gif), and can contain shell-style wildcards. Broken symlinks are included in the results (as in the shell).

glob.iglob(pathname)
Return an iterator which yields the same values as glob() without actually storing them all simultaneously.

In [5]:
glob.glob("*.bz2")

['searches.csv.bz2', 'bookings.csv.bz2']