regarding preprocessing dataset request #1

karthikeyana · 2015-03-04T19:37:16Z

can you post data preprocessing program in our blog.

bingweiliu · 2015-03-04T19:41:04Z

Karthikeyana, what do you mean by posting the program to your blog? Where is your blog? The pre-processing program is a simple script to processing every interview into one line and remove unneeded items.

karthikeyana · 2015-03-04T20:02:37Z

import csv
import glob
import os

directory = raw_input("INPUT Folde:")
output = raw_input("OUTPUT Folder:")

txt_files = os.path.join(directory, '*.txt')

for txt_file in glob.glob(txt_files):
with open(txt_file, "rb") as input_file:
in_txt = csv.reader(input_file, delimiter='=')
filename = os.path.splitext(os.path.basename(txt_file))[0] + '.csv'

    with open(os.path.join(output, filename), 'wb') as output_file:
        out_csv = csv.writer(output_file)
        out_csv.writerows(in_txt)

sir i am using this code to convert all txt files to csv but i did not get this format sir plase help me

:POS: :41: i disagree with the reviewers who said the movie was predictable and
drawn out it was a movie with heart and you could feel the main characters plight
when he lost his companion being an animal lover i was pulling for the happy
ending of course i am disney s biggest fan and i love this movie right along with
the others p s i am a grandmother to eleven thank heavens for disney movies
:POS: :85: sit back and enjoy the interesting and exciting story of the count of
monte cristo great rainy day movie
:POS: :95: a very well done film and an excellent cast i d put it right up with the
three and four musketeers movies york reed chamberlain heston etc
:POS: :96: this is an excellent movie and i never read the book the acting and the
plot was very nice done it is one of my favorite movies

karthikeyana · 2015-03-04T20:34:52Z

sir can you post the script in command box

karthikeyana · 2015-03-04T21:11:23Z

15/03/05 02:26:39 INFO input.FileInputFormat: Total input paths to process : 2
15/03/05 02:26:39 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/03/05 02:26:39 WARN snappy.LoadSnappy: Snappy native library not loaded
15/03/05 02:26:40 INFO mapred.JobClient: Running job: job_201503042232_0030
15/03/05 02:26:41 INFO mapred.JobClient: map 0% reduce 0%
15/03/05 02:26:59 INFO mapred.JobClient: map 100% reduce 0%
15/03/05 02:27:16 INFO mapred.JobClient: Task Id : attempt_201503042232_0030_r_000000_0, Status : FAILED
java.lang.NumberFormatException: For input string: "1""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.valueOf(Integer.java:582)
at com.ift.hadoop.NBTrainingReducer.reduce(NBTrainingReducer.java:26)
at com.ift.hadoop.NBTrainingReducer.reduce(NBTrainingReducer.java:8)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

15/03/05 02:27:16 INFO mapred.JobClient: Task Id : attempt_201503042232_0030_r_000001_0, Status : FAILED
java.lang.NumberFormatException: For input string: "1""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.valueOf(Integer.java:582)
at com.ift.hadoop.NBTrainingReducer.reduce(NBTrainingReducer.java:26)
at com.ift.hadoop.NBTrainingReducer.reduce(NBTrainingReducer.java:8)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

15/03/05 02:27:26 INFO mapred.JobClient: map 100% reduce 6%
15/03/05 02:27:28 INFO mapred.JobClient: Task Id : attempt_201503042232_0030_r_000000_1, Status : FAILED
java.lang.NumberFormatException: For input string: "1""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.valueOf(Integer.java:582)
at com.ift.hadoop.NBTrainingReducer.reduce(NBTrainingReducer.java:26)
at com.ift.hadoop.NBTrainingReducer.reduce(NBTrainingReducer.java:8)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

15/03/05 02:27:29 INFO mapred.JobClient: map 100% reduce 0%
15/03/05 02:27:29 INFO mapred.JobClient: Task Id : attempt_201503042232_0030_r_000001_1, Status : FAILED
java.lang.NumberFormatException: For input string: "1""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.valueOf(Integer.java:582)
at com.ift.hadoop.NBTrainingReducer.reduce(NBTrainingReducer.java:26)
at com.ift.hadoop.NBTrainingReducer.reduce(NBTrainingReducer.java:8)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

15/03/05 02:27:37 INFO mapred.JobClient: map 100% reduce 3%
15/03/05 02:27:38 INFO mapred.JobClient: map 100% reduce 6%
15/03/05 02:27:39 INFO mapred.JobClient: Task Id : attempt_201503042232_0030_r_000000_2, Status : FAILED
java.lang.NumberFormatException: For input string: "1""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.valueOf(Integer.java:582)
at com.ift.hadoop.NBTrainingReducer.reduce(NBTrainingReducer.java:26)
at com.ift.hadoop.NBTrainingReducer.reduce(NBTrainingReducer.java:8)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

15/03/05 02:27:40 INFO mapred.JobClient: map 100% reduce 3%
15/03/05 02:27:40 INFO mapred.JobClient: Task Id : attempt_201503042232_0030_r_000001_2, Status : FAILED
java.lang.NumberFormatException: For input string: "1""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.valueOf(Integer.java:582)
at com.ift.hadoop.NBTrainingReducer.reduce(NBTrainingReducer.java:26)
at com.ift.hadoop.NBTrainingReducer.reduce(NBTrainingReducer.java:8)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

karthikeyana · 2015-03-04T21:13:39Z

this is my error message when i am running in single node hadoop

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regarding preprocessing dataset request #1

regarding preprocessing dataset request #1

karthikeyana commented Mar 4, 2015

bingweiliu commented Mar 4, 2015

karthikeyana commented Mar 4, 2015

karthikeyana commented Mar 4, 2015

karthikeyana commented Mar 4, 2015

karthikeyana commented Mar 4, 2015

regarding preprocessing dataset request #1

regarding preprocessing dataset request #1

Comments

karthikeyana commented Mar 4, 2015

bingweiliu commented Mar 4, 2015

karthikeyana commented Mar 4, 2015

karthikeyana commented Mar 4, 2015

karthikeyana commented Mar 4, 2015

karthikeyana commented Mar 4, 2015