Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: literal carriage return found in data #523

Closed
lanphan opened this issue Mar 15, 2016 · 4 comments
Closed

Error: literal carriage return found in data #523

lanphan opened this issue Mar 15, 2016 · 4 comments
Labels

Comments

@lanphan
Copy link

lanphan commented Mar 15, 2016

Hi all,

I got error below when doing "deepdive do sentences" in quickstart example ("has spouse" example) with full dataset from signalmedia (1 million records):

2016-03-15 19:13:27.915991 Parsing document 1060ad64-521f-46c7-a804-4181d97f9bf0...
2016-03-15 19:13:29.000338 ERROR:  literal carriage return found in data
2016-03-15 19:13:29.000385 HINT:  Use "\r" to represent carriage return.
2016-03-15 19:13:29.000395 CONTEXT:  COPY dd_tmp_sentences, line 1555167

Content of data from document 1060ad64-521f-46c7-a804-4181d97f9bf0 is:

Taris Savell, a reporter for radio, television and the Pensacola News Journal, is most remembered by many people for her celebrity interviews and for the celebrities she attracted to the Pensacola area for festivals and charity events. Taris will soon be leaving the Pensacola area she loves to move closer to family in the Baton Rouge. (Photo: Bruce Graner/bgraner@pnj.com) 

Taris Savell was in control of the interview. She hand-picked the photographer — “I want Bruce Graner.” And once Bruce arrived at the Life Care Center nursing home on Olive Road where Savell has been recuperating since an August fall, she took charge. 

“Let me know when you’re shooting, Bruce,” she said. “Because I want to pose. Just don’t blindside me.” 

Then, she tells Bruce and me that after we interview her, she is going to interview us for her weekly radio show on WPNN. No request. It was an order. 

Savell is allowed a little diva-like behavior. After all she learned from the best as a celebrity interviewer and reporter long before the days of E! News, the Kardashians and either Paris or Perez Hilton. In her decades in Pensacola, working for a variety of media outlets — including the Pensacola News Journal — Savell interviewed them all, from Elvis to Bette Davis to Bob Hope to Gerald Ford and dozens and dozens more. She interviewed astronauts and actors and musicians and the grand dames of Hollywood. Her vast collection of celebrity memorabilia is so vast, she sold off much of it in an estate sale in 2001. 

Taris Savell often interviewed comedian and avid golfer Bob Hope on his visits to the Pensacola Bay Area. Hope was unfailingly generous with his time, from the first interview when Savell was a novice reporter to years later when Hope came to Pensacola to film a television special aboard the USS Lexington.   (Photo: News Journal file photo) 

But now comes the final curtain on Savell’s long, illustrious radio/television/newspaper career in Pensacola. The injury — she fell and broke her left femur — has prompted her to retire and move to Baton Rouge, Louisiana, to be near cousins. Savell, who never married and has no children, has no family members in Pensacola. 

She leaves today, the day of her last broadcast of the Joe Patti Seafood Interview Beat with Taris Savell. The program airs at 9 a.m. on WPNN-AM 790. 

Still, with her radio-ready voice still intact, she’s already hoping for a gig or two in Louisiana. 

“They’re already setting up voice-over work for me,” she said. “So I’ll have a little work.” 

Taris Savell, right, and Helen Brown-Galloway look over some of the items that were to be sold at Savell's estate sale. Savell conducted the last interview with Jayne Mansfield before she was killed in a car accident.   (Photo: News Journal file photo) 

But her time in Pensacola is over. Savell and her family moved from Selma, Ala., when she was 6 years old, and she has been here ever since. 

Just don’t ask her how long that’s been. Because she’s not giving out her age no matter how hard you try. 

“I’ll give you the answer Bette Davis gave me,” she said. “I’m young enough to want to tell, but old enough to know better.” 

Or, the answer she often gives when delivering speeches: 

“I was here when DeLuna landed the first time.” 

After attending Louisiana State University, Savell returned to Pensacola and landed her first media job — writing commercials for WBSR. She also hosted a what-was-then-called “women’s issues” program, “Femme Fatale. Topics generally included cooking and fashion. She moved to radio station WCOA, where she worked as a “she-jay” spinning records. Soon, she was interviewing nearly every celebrity who came to town, an occupation she would continue throughout her life. One of her first interviews was with legendary television personality Arthur Godfrey in the late 1950s. 

“It was horrible,” she said. “It was all yes and no answers.” 

Taris Savell has always seemed to have a talent for getting interview with national or international celebrities. In a magazine clipping from a story on Taris herself she is shown interviewing Doris Day, left.   (Photo: Bruce Graner/bgraner@pnj.com) 

Next, was legendary ventriloquist Edgar Bergen and by 1962, the celebrity interviewer was garnering some celebrity of her own. She was featured in 1962 magazine edition of TV Radio Mirror, a magazine about celebrities and their projects. The article, titled “A Hometown Gal,” featured photos of a smiling Savell with various luminaries of the day. 

She was a pioneer in radio and television at a time when men dominated the business. Still, she never really took to “women’s liberation.” 

“People asked me if I believe in women’s lib,” she said. “No, just ad lib. Because that’s what I did. I never used notes.” 

She hosted programs and celebrity interviews on WEAR-TV, capturing some of the biggest stars in the universe. 

Savell even interviewed a 21-year-old Elvis Presley at the height of his stardom in February, 1956, at the Pensacola Municipal Auditorium. 

“The first time I met him, I was nervous,” Savell recalled. “I had this heavy, bulky tape recorder and he just kept walking around the dressing room, and I kept following him around. He suddenly stopped and reached over an pulled an eyelash that had come out. To this day, I have the eyelash.” 

She met Elvis again years later, when she was escorting Pensacola beauty pageant contestants to a gala in California. Elvis expressed interest in one of the Pensacola contestants and wanted to go out with her, but Savell told him, “not without a chaperone.” The date never happened, and the beauty contestant “hasn’t spoken to me since.” 

Taris Savell, a reporter for radio, television and the Pensacola News Journal, is most remembered by many people for her celebrity interviews and for the celebrities she attracted to the Pensacola area for festivals and charity events. Taris will soon be leaving the Pensacola area she loves to move closer to family in the Baton Rouge. Some of Taris' celebrity exploits were covered in the past issue of TV/Radio Mirror Magazine and included photos of Taris with Doris Day, some of the actors from the Gunsmoke TV series among others.   (Photo: Bruce Graner/bgraner@pnj.com) 

She interviewed vice-presidents who would become presidents — Richard Nixon and Gerald Ford. (Ford even gave her flowers.) 

She interviewed Bob Hope on numerous occasions. 

“But he could never remember my name,” she said. “He would see me and say ‘There she is.” 

Doris Day invited her to her hotel room one morning for breakfast. For years, she booked the celebrity guests at the now-gone St. Anne’s Round-Up in Bellview, bringing in guests from Angie Dickinson - “one of the best; just terrific” — to Tony Danza - “the worst; just a (expletive.)” 

She interviewed Ernest Borgnine, Eva Gabor and even Ed Sullivan. But there were three celebrities that were on her wish list that escaped her: Queen Elizabeth, Tom Selleck and Barbra Streisand. 

She has no interest in the celebrity factory of today. 

“The whole medium has changed,” she said with a scoff. “There’s not a star I’m interested in.” 

What about the Kardashians? 

“Please! They’re (expletive, again.)” 

Savell might be working on walking again, but her mind and wit are as sharp as ever. The young lady from Pensacola High School did pretty good with herself. 

What year did you graduate, by the way? 

“Give it up, Troy. Don’t even try.” 

Any parting words for your Pensacola fans? 

“I have a pillow on my couch that says it all — “Screw the golden years.” 

Read or Share this story: http://on.pnj.com/1Fq8w3s

Google around, I see that it's an error in copy command of postgres. I have some questions:
1/ Is there anyway to pre-process data to prevent this bug happen again?
2/ Is there any known bug like this, so that we can collect and create a specific pre-process step to prevent all of them once?
3/ Deepdive has any mechanism to log these errors + skip them in order to continue to run?

@netj
Copy link
Contributor

netj commented Mar 15, 2016

  1. You can preprocess the data in anyway if you have a .sh script under input/ as we do for input/articles.tsv.sh.
  2. The doc_id that appeared before the ERROR line wasn't necessarily the input at the moment since they all run in parallel. With a quick grep '\\r' *.jsonl you can see many articles contain \r. To filter these, we actually had a gsub in input/articles.tsv.sh of spouse example that takes care of this. Maybe we should add a comment to make this more apparent.
  3. DeepDive currently doesn't checkpoint within a single "process," but it does once you finish and move on to the next one. However, this would be a useful feature for development. We'll think about how this could be done without sacrificing efficiency too much. For other types of compute resource drivers that are coming along, this partitioning and checkpointing will become the default so you can run and resume things idempotently.

The intended way to run the whole corpus of signalmedia-1m, is to put the directory under input/ so the file sits at input/signalmedia/signalmedia-1m.jsonl. That way, the input/articles.tsv.sh will pick up the .jsonl file and apply the necessary filters. You may want to remove the grep commands to not drop any articles. Please reopen if you find more issues.

@netj netj closed this as completed Mar 15, 2016
@lanphan
Copy link
Author

lanphan commented Mar 15, 2016

@netj
I only comment "head -100 |" in order to get all relevant data.
I think current gsub is not enough, see my attached file below to see result of grep '\\r' input/articles-1m.tsv with articles-1m.tsv is an output of input/articles.tsv.sh

still_contains_r.txt

@lanphan
Copy link
Author

lanphan commented Mar 15, 2016

@netj
I rename "still_contains_r.txt" to "articles.tsv" and run "deepdive do sentences", I got the same error.
Would you please re-open this bug? (don't see Open or Reopen here).

Ps: thanks to your feedback, I understand that input/articles.tsv.sh is as pre-process step for Deepdive also (runs inside Deepdive, not separated step).

@netj
Copy link
Contributor

netj commented Mar 15, 2016

Yes, I confirmed there seems to be issues with the existing \r handling within jq. I'll fix this asap and update here. Meanwhile, you could just add a good old sed line, which is probably going to be safe and more complete than before keeping carriage-return-phobic PostgreSQL happy:

cat "$corpus" |
#grep -E 'wife|husband|married' |
#head -100 |
jq -r '[.id, .content] | @tsv' |
# take care of carriage returns
sed 's/\\r//g'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants