Pride and Prejudice appears to have some pretty non-standard English. In general, I broke sentences by a combination of punctuation marks, spaces and capital letters. I added caveats to avoid splitting on common abbreviations (and some uncommon, but present in Pride and Prejudice ones). For interjections, which are used often, I chose to only count them as a sentence if the next word started with a capital letter. For example:
* "Oh! shocking!" counts as a single sentence, not two. 
* "Oh! Single, my dear, to be sure!" would count as two.

For the use of dashes, I used the following rule to consider them a sentence:
* must be preceded by a period, ? or !, with optional quotation marks
* must be followed by optional quotation marks and a capital letter

*Note: this misses at least one pseudo sentence. But, not requiring some punctuation before the dashes would result in breaking a lot of sentences that clearly are using dashes as commas.*

For headers I:
* grabbed the Title, byline and first chapter heading and deleted them
* deleted additional chapter headings
* deleted the space(s) + asterisks + space(s)

I am quite sure this could be done more elegantly. Pretty much every post I read on the subject said we're crazy to try to build our own tokenizer. :)

In [75]:
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re



def checkOh(inText):
    """
    Checks passed-in text to see if it matches the pattern "Oh! + Capitalized word," (Used in several places as part of quotes)
    
    Parameters
    ----------
    inText: str
    A string to be checked
      
    Returns
    -------
    Boolean
    The answer to the question: Does it match the pattern.
    """
        
    Pride and Prejudice appears to have some pretty non-standard English. In general, I broke sentences by a combination of punctuation marks, spaces and capital letters. I added caveats to avoid splitting on common abbreviations (and some uncommon, but present in Pride and Prejudice ones). For interjections, which are used often, I chose to only count them as a sentence if the next word started with a capital letter. For example:
* "Oh! shocking!" counts as a single sentence, not two. 
* "Oh! Single, my dear, to be sure!" would count as two.

For the use of dashes, I used the following rule to consider them a sentence:
* must be preceded by a period, ? or !, with optional quotation marks
* must be followed by optional quotation marks and a capital letter

*Note: this misses at least one pseudo sentence. But, not requiring some punctuation before the dashes would result in breaking a lot of sentences that clearly are using dashes as commas.*

For headers I:
* grabbed the Title, byline and first chapter heading and deleted them
* deleted additional chapter headings
* deleted the space(s) + asterisks + space(s)

I am quite sure this could be done more elegantly. Pretty much every post I read on the subject said we're crazy to try to build our own tokenizer. :)
 




test_str = ("“Oh! Mary,” said she, “I wish you had gone with us, for we had such fun!
As we went along, Kitty and I drew up the blinds, and pretended there
was nobody in the coach; and I should have gone so all the way, if Kitty
had not been sick;")


print(splitSentences(test_str))

['Elizabeth told her the motives of her secrecy.', 'She had been unwilling to mention Bingley; and the unsettled state of her own feelings had made her equally avoid the name of his friend.', 'But now she would no longer conceal from her his share in Lydia’s marriage.', 'All was acknowledged, and half the night spent in conversation.', '“Good gracious!” cried Mrs. Bennet, as she stood at a window the next morning, “if that disagreeable Mr. Darcy is not coming here again with our dear Bingley!', 'What can he mean by being so tiresome as to be always coming here?', 'I had no notion but he would go a-shooting, or something or other, and not disturb us with his company.', 'What shall we do with him?', 'Lizzy, you must walk out with him again, that he may not be in Bingley’s way.”']


In [81]:
cList = [1,2,3]
nList = 4 + 5 + 6

print(type(nList))
combined_list = zip(cList,nList)
print(list(combined_list))

<class 'int'>


TypeError: zip argument #2 must support iteration

In [1]:
import datetime

total = 93600

lastArrival = str(datetime.timedelta(seconds=total))
print("Last Arrival based on full precision: ", lastArrival)


Last Arrival based on full precision:  1 day, 2:00:00


In [11]:
c = [8.32, 6.02, 7.53]

n = [2.1, 1.65, 2.67]

combined = zip(c,n)

winners = [[c, n] for  c, n in  combined if c < 8]

print(winners)

[[6.02, 1.65], [7.53, 2.67]]


In [18]:
string = 'This is some text and this is more text'
punct = set('.!?')

def hasPunct(inString):  
    if any((c in punct) for c in inString):
        return True
    else:
        return False
              
hasPunct(string)             

False

In [61]:
import string
def hasPunct(inString):
    """
    Returns a boolean to indicate if the string has a period, ! or ?
    
    Parameters
    -----------
    inString: str
    The string to be evaluated
    
    Returns:
    Boolean
    The true or false answer to the question, does this string have punctuation.
    """
    punct = set('.!?”’')
    if any((c in punct) for c in inString):
        return True
    else:
        return False

    
def matchesAbbreviations(inString):
    """
    Returns a boolean to indicate if the passed in string matches one of the common abbeviations in Pride and Prejudice
    
    Parameters
    ---------------
    inString: str
    The string to be evaluated
    
    Returns:
    Boolean
    The true or false answer to the question, does this match any of these abbreviations: Mr. Dr. Ms. Mrs. Jr. Sr.  St. EDW. E. M. 
    """
    #strip the punction and make it lower case
    cleanedString = inString.lower()
    exclude = set(string.punctuation + '”’')
    cleanedString =  ''.join(ch for ch in cleanedString if ch not in exclude)
    
    #make a list of abbreviations
    abbrev = ['mr', 'dr', 'ms', 'mrs', 'jr', 'sr', 'edw', 'e', 'm']
    
    anyMatches = False
    for item in abbrev:
        if item == cleanedString:
            anyMatches = True

    return anyMatches

def checkNextWord(inWord):
    """
    Returns a boolean to indicate if the passed in word meets the definition of a sentence-starting word (assuming the prior word has potential sentence-ending punctuation)
    Definition: Capitalized or “+Capitalized or “_+Capitalized or ‘+Capitalized
    
    Parameters
    --------------
    inWord - str
    a word to be evaluated
    
    Returns
    ---------------
    Boolean
    The answer to the question, "Is this word the start of a new sentence.
    
    """
    
    #look at the first character
    firstChar = inWord[0]
    if firstChar.isupper():
        return True
    else:
        punct = set("“‘")
        if firstChar in punct:
            if inWord[1].isupper:
                return True
            if inWord[1] == '_':
                if inWord[2].isupper:
                    return True
                
    return False            
    

    
lineSplit = [' It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.',
 'However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of some one or other of their daughters.',
 '“My dear Mr. Bennet,” said his lady to him one day, “have you heard that Netherfield Park is let at last?”',
 'Mr. Bennet replied that he had not.',
 '“But it is,” returned she; “for Mrs. Long has just been here, and she told me all about it.”',
 'Mr. Bennet made no answer.',
 '“Do you not want to know who has taken it?” cried his wife impatiently.',
 '“_You_ want to tell me, and I have no objection to hearing it.”',
 'This was invitation enough.']

def splitSentencesNoRegex(inText):
    """
    Returns a list with one sentence per item in the list, without using regular expressions
    
    Parameters
    ----------
    inText: inText
    A list of  to be broken into sentences
      
    Returns
    -------
    outList: list of str
    a list of sentences
    """
    #join the line split
    text = ' '.join(inText)

    #replace any double spaces with singles
    text = text.replace('  ', ' ')

    #split on space to get words
    words = text.split(' ')

    # make a new list. 
    outList = []
    #make a variable to track sentences
    sentence = ''
    #loop through each word
    for i, word in enumerate(words):
        if  hasPunct(word) == False:
            #it's just a word without punctuation, so it isn't ending a sentence. Add it to the sentence
            sentence = sentence + word + ' '
        else:
            #This might be the end of a sentence.     
            #is this the last word?
            if i == len(words) -1:
                sentence = sentence + word
                outList.append(sentence)
                sentence = ''
            else:
                #is the next word the start of a new sentence?
                nextWord = words[i + 1]
                if(checkNextWord(nextWord) == False):
                    #no upper letter starting the next word, so this isn't the end of the sentence
                    sentence = sentence + word + ' '
                else:
                    #is it one of the common abbeviations?
                    if matchesAbbreviations(word) == False:
                        #it's the end of the sentence
                        sentence = sentence + word
                        outList.append(sentence)
                        sentence = ''
    return outList


splitSentencesNoRegex(lineSplit)    


[' It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.',
 'However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of some one or other of their daughters.',
 '“My dear Bennet,” said his lady to him one day, “have you heard that Netherfield Park is let at last?”',
 'Bennet replied that he had not.',
 '“But it is,” returned she; “for Long has just been here, and she told me all about it.”',
 'Bennet made no answer.',
 '“Do you not want to know who has taken it?” cried his wife impatiently.',
 '“_You_ want to tell me, and I have no objection to hearing it.”',
 'This was invitation enough.']

In [66]:
def matchesAbbreviations(inString):
    """
    Returns a boolean to indicate if the passed in string matches one of the common abbeviations in Pride and Prejudice
    
    Parameters
    ---------------
    inString: str
    The string to be evaluated
    
    Returns:
    Boolean
    The true or false answer to the question, does this match any of these abbreviations: Mr. Dr. Ms. Mrs. Jr. Sr.  St. EDW. E. M. 
    """
    #strip the punction and make it lower case
    cleanedString = inString.lower()
    exclude = set(string.punctuation + '”’“‘')
    cleanedString =  ''.join(ch for ch in cleanedString if ch not in exclude)
    
    #make a list of abbreviations
    abbrev = ['mr', 'dr', 'ms', 'mrs', 'jr', 'sr', 'edw', 'e', 'm','st']
    
    anyMatches = False
    for item in abbrev:
        if item == cleanedString:
            anyMatches = True
            break

    return anyMatches


matchesAbbreviations("“Mr.")

True

In [68]:
text = "This is some text . "
text = text.strip()
print(text)

This is some text .


In [71]:
clist = [1.2, 3.4, 5.6]
nelist = []
for i in range(3):
    print(clist[i])

import numpy 
nevalues = numpy.random.normal(0,1,50)
print (nevalues)


1.2
3.4
5.6
[-0.00343842  1.5347867  -1.50867052  0.37268242 -0.0636173   0.591985
 -1.09790392 -1.18421192 -1.43472874 -0.87464945  0.25369501 -0.6645293
  0.40805257  0.74160696  0.37795736  0.97893021 -0.92628278  0.13333273
  0.93037951  1.3165168  -0.51250703  0.28063798 -0.24283931  0.10857201
  1.50913518  1.4176537  -1.08773498  0.5652114   0.85369684 -1.04573277
  1.76616156  0.7624037   0.02818238 -0.43286235 -0.4068066  -2.68587779
 -1.12988916  0.1086885  -1.07484803  0.3322055  -0.51593236 -0.11757036
  0.10904964  1.15996236 -1.0615955  -0.45782917  0.70794826  1.09528412
  0.09125677 -0.12603431]


In [86]:
mylist1 = [1.2, 3.4, 5.6]
mylist2 = [5.4, 6.6, 4.1]

combined = list(zip(mylist1, mylist2))
print(combined)

#access items within the tuples
abiggerthanb = [myTuple for myTuple in combined if myTuple[0] > myTuple[1]]

abiggerthanb2 = [(a,b) for (a,b) in combined if a > b]

[(1.2, 5.4), (3.4, 6.6), (5.6, 4.1)]
